THE LOST TAGS OF HTML

  1. Nature and purpose of this document
  2. Compliance with the various SGML or XML models of HTML
  3. Sources and Structure of Findings
  4. Technical history of HTML - an Introduction
  5. "HTML 0.a" A Prototype HTML, The Very Beginning
  6. "HTML 0.c" The Earliest Documented HTML
  7. "HTML 0.d" Making the NeXT Editor SGML Compliant
  8. "HTML 0.h" The Oldest Surviving Usable DTD
  9. "HTML 1.k" A First Draft Published for HTML 1
  10. "HTML 1.m" A More Advanced HTML 1 Version
  11. A Brief note on HTML+
  12. HTML 2.0: The Oldest Recognized W3C Version
  13. HTML 3.2: A Much Richer Standard, and a Road Not Taken
  14. HTML 4.01: A Mature and Stable Markup Language
  15. XHTML 1.0: Changing the Basis from SGML to XML
  16. XHTML 1.1: Leaving the Past Behind
  17. A Brief note on HTML 5.0 and XHTML 2.0

"It often happens that the mind of a person who is learning a new science, has to pass through all the phases which the science itself has exhibited in its historical evolution." - Stanislao Cannizzaro, Italian chemist, 1826 - 1910

I think the most truly seminal moment in the history of the internet and the World Wide Web occurred on that fateful day in 1990 when the decision-makers at CERN Institute in Switzerland accepted a proposal written by Tim Berners-Lee regarding a new networking scheme and commissioned him to begin working on this project, and provided him with a black, cube-shaped computer with a NeXT editor and other related software on it. Or perhaps it was that day not long after that Mr. Berners-Lee, using that computer, constructed the first working http server and web page featuring an inter-document hyperlink. Though hyperlinks had been mathematically speculated upon for decades, this was the first working instance of one capable of linking assets even across a computer network. Per his proposal, his concern was to provide a way to index and connect a large and growing number of academic papers being authored by the various scientists at the CERN Institute, and other scientists around the world with whom they were in communication through the small but growing internationally-spread computer network.

Anyone familiar with peer reviewed academic papers knows that one common motif within virtually all of them is the footnote, providing a reference to some other previous academic paper, usually by someone else, to provide some quote or basis for the subject matter being discussed at that point. In conventional research as it was known back then, if a reader of such a paper wanted to know about something being quoted or referred to, he would look up its title in the footnote, and then go to a library and have the reference clerk there track down the cited article. This could take hours (if the library has on hand the desired article), or days, or even weeks (if they don't). Already, things were improving in that computers were already being connected up in a small but rapidly expanding network, and that most current academic papers of significance were being made available over that network. But even there it could take hours to find what you are looking for, and if a paper is large, to find the desired quote or reference within that paper.

In that day the point and click, mouse oriented, user interface was still very much in its infancy, but it was a known idea. Several point-and-click Graphic User Interfaces were already being marketed. "What if," one can just see Mr. Berners-Lee thinking to himself, "one could point and click on a footnoted reference, click on that, and BAM! up comes the particular referenced article, and even on the correct page within the article where the relevant topic is discussed or the quote comes from?" After all, at least the author of such an academic article would have looked up his own references and would know where to find them, so once he knows where they are to be found on the network, that information could be saved as (for example) some host_and_file_name field and a computer program ("user agent," or "browser") would be able to use that information to fetch the desired article over the network with computerized speed and efficiency.

And it would not require all that much time, despite the slowness (as we would see it today) of the network in 1990. Bear in mind that in 1990, data transfer rates were measured in "baud." A common baud rate found in that era was 9600. Sometimes one could get data transferred at 19200 baud, but that was still rare and error-prone, and some nodes on the network were limited to far lower baud rates. On the other hand, an academic paper did not have a lot of additional garbage (graphics, applets, advertising banners, and just overall lengthy and cumbersome formatting tags) that so dominates virtually all web pages today. They would be formatted as raw ASCII text files, and look very much like this online copy of RFC 1111. So even at 9600 baud, a large article like this could still download within a matter of seconds, far faster and easier than going to the library, or even having to track it down on the network, and then use ftp: or telnet: to download it or look at it, as was the common practice in 1990 when using the computer network for academic or scientific research.

As there was little to no software for "Web programming," much of this initial HTML research and experimentation was done using the NeXT editor or else some simple ASCII text editors. HTML was meant to be easy to write, simple in what it does and how it does it. The practical upshot of that background is the interesting result that HTML is the easiest "computer language" to write in, by far. It is so easy that just any Joe Sixpack can pick up a basic HTML How-To book, and with a computer, internet connection, and simple text editor, he can be writing and posting presentable HTML web pages within half an hour.

Unfortunately, as HTML advanced, this simple charm of the easy-to-master HTML language gets gradually lost as HTML not only acquires features, but becomes more and more dependent upon them, and also as basic features that made it so easy are done away with, being replaced with more advanced and versatile features. The Joe Sixpack who can be coding up presentable HTML 2.0 so quickly would be at quite a loss trying to do the same with XHTML 2.0 in anything like the same short period, even once there should actually come to be such a thing as an official published XHTML 2.0. The more HTML is made more compatible and similar to the other file formats, such as Microsoft Word documents (.doc), Rich Text Format (.rtf), or Portable Document Format (.pdf), the less human readable it becomes, and the more the document writer becomes dependent upon special editors to do all of the low level stuff. Correspondingly, the files themselves also become less compact as more and more sophisticated and obscure (and lengthy) non-text data comes to predominate, to say nothing of the way these editors generate inefficient and wordy/lengthy code to begin with.

There is also the question of whether these HTML features should be called "elements" or "tags." Originally, the custom was to call them "tags" but as HTML as a language got tightened up and formally documented the alternate expression "elements" seems to have displaced it. For a while there was a tug of war between these two terms, but in the end, the consensus seems to be that, one uses the word "element" to refer to the item in general, as a Platonic or philosophical ideal, or (per the document object module and style sheet functionality) the whole package of an opening tag, the closing tag (whether present or implied by context), and all the material between them, and the word "tag" to refer to an occurrence of an element, or more particularly its opening and/or closing tags, within an HTML file. So, for example, one would say, "The <A> element serves to provide a link to another document, or another portion of the same document," or else "This <A> element links the phrase XYZ to the document ABC," and "Here is the problem; you forgot to put a closing </A> tag here in your file." In this treatise, I will tend to lean towards the older and more colloquial usage of "tag" in most places. Fortunately, the term for the other relevant HTML feature which I also discuss, the "attributes," has never gone through such name change contusions.

Here Lies <TAG> R. I. P.

The approach of this online HTML treatise is to focus mostly on those particular elements ("tags") which are gone, or at least going away, and to track in detail the comings and goings of the various elements and attributes across the versions of HTML. A special emphasis is given here to those many versions of HTML that precede HTML 2.0, a breakdown of the many discrete versions of HTML, which is something that to my knowlege has not yet been attempted. Most modern HTML writings focus either on those elements with long-standing value (such as <A>, <TITLE>, <UL>, and <P>) as necessarily continue from the very earliest HTML to the (as yet) latest version, and on the brilliant and powerful new features which are being added. Little to no attention however gets paid to those other elements ("tags") which are falling by the wayside, and which altogether don't exist in XHTML 1.1 and later versions of XHTML, or else are tolerated in HTML5 only as backward compatibility for old files, and even less to those already declared obsolete in HTML 4.0. (One notable exception is HTML & XHTML - The Definitive Guide, by Chuck Musciano & Bill Kennedy, O'Reilly Media, Inc. which retains some of this early material.)

For myself, I find that the unique perspective of studying those tags and attributes which are depreciated, going away, or completely gone should prove to be an interesting approach to exploring the history and nature of HTML, and occasionally a few of some of the related technologies. Even so, I don't completely restrict my considerations to the lost tags, but also take this opportunity to explore some others of the more arcane and obscure features of HTML, the kind of stuff that receives little to no coverage, original material not available elsewhere. But this treatise is not meant merely to serve as a nostalgia trip down memory lane (though it may serve as that too). Rather, it is a chance to discuss the nature of technological change and progress, to see some of the academically clever ideas that fail to take hold in the marketplace, and conversely the ideas hastily implemented by the industry but with little or no standing with the academic community due to their misuse of what HTML was really meant to be about, to discuss backward-forward compatibility issues, and to understand some of the inner workings of HTML from a layman's perspective. Furthermore, support for these depreciated and obsolete tags and attributes continues to be, in many of the more contemporary browsers and user agents, done with grave inconsistencies. A new generation of browser writers has arisen who have no real understanding of what these underdocumented tags and attributes were, what they did, or how they were meant to be used.

This is not meant to be the sort of "directed research" guide that provides direct and easy access to how to code up the latest new HTML widget, but rather intended to serve as a kind of "non-directed" research to inspire curiosity, interest, and perhaps one may serendipitously come across some clever HTML idea. In short, this may not be what you want, but at least it should be interesting.

The structure of this work is that it all starts from this main file, which itself is a simple example of fully compliant XHTML 1.0 Strict, transmitted as Content-Type "text/html," and from which individual files discuss each of the specific tags, attributes, or topics of interest here. An XHTML1.1 version of this file, transmitted as Content-Type "application/xhtml+xml," is located here. The subfiles are rendered in the various versions of HTML or XHTML, in most cases transmitted as Content-Type "text/html," and themselves examples of the tag or attribute or topic in question. These files in turn discuss the history, usage, and recommended implementation of these various tags and attributes, and finally include further subfiles of their own as examples of upgrades and downgrades. The Upgrades and Downgrades discuss alternate means of achieving a similar effect as the tag intends using either more or less advanced features, and also various additional features to be added to the tag in later versions of HTML or even XHTML. Optionally, there may at times also be a discussion of some few selected "proprietary extensions" relating to the tag or version of HTML being discussed, but this consideration is by no means intended to be exhaustive.

Though the reasons for deleting these lost tags and attributes from the newer versions of HTML and XHTML are explained herein, that is not to say that such omissions are justified in browser implementations. The Joe Sixpack HTML programmers of the world are not going away anytime soon, and even now are applying pressure to the Web Consortium to please develop HTML and not merely focus on XHTML, since HTML programming is so much easier to do. These same folks also so often use older and simpler versions of HTML and its explanatory documents that it is best that the deleted tags and attributes continue to be implemented in browsers as many of them indeed continue to be. And if they are to continue to be implemented, I discuss here the optimal manner in which they ought to be implemented that best captures the spirit and intention of them as originally introduced.

This work can only exist online since it provides actual working examples of these lost tags and attributes. This serves to demonstrate what they were like, to enable one to test how their own browser handles these things, to show the full and proper context in which they function, and to do all this with files that conform to the various HTML standards. This is a "work in progress" as is all HTML, so subjects not treated of herein as yet will be added as time progresses.

Compliance with SGML or XML models of HTML

With the two exceptions of a) examples that feature the various proprietary extensions, or b) some very early tags and structures which disappeared before HTML 2, I have endeavored to see to it that all examples herein comply fully with the World Wide Web Consortium's standards. Each file can be validated by using a link at the bottom (or else a special link to its validation in its subfile above), and which not only validates the file but also shows its source text exactly as validated. The following standards are variously used, depending on the file:

In the case of those examples that feature proprietary extensions, I have seen to it that with the one exception of the particular proprietary tag, attribute, attribute value, or nesting of the tags that cannot comply with any of these standards, the file is nevertheless otherwise in compliance with one or more of these standards. In the case of certain tags and attributes that vanished prior to HTML 2.0, I have written it to the standard that introduced the tag or attribute, or else in a "standard" which appears to reflect the nature of HTML at a given time, as captured in DTDs that I have written.

Many HTML guides out there showcase examples of at least some of the simpler and commoner tags which display reasonably on most browsers, but here we try to showcase the usage of somewhat more complicated tags to use and also try to take care to see to it that it complies with whichever HTML (or XHTML) standard it needs to comply with, or specifically note where it does not or cannot comply.

One convention I use here is that every file which is compliant with some version of HTML will begin with the <!DOCTYPE> declaration appropriate to the version of HTML it complies with. Those files with proprietary details which fail to conform will lack this declaration, but the link I provide to validate (and fail) the file will also automatically specify the standard I otherwise wrote it to. Though I call many of these files "Working Examples," in some cases there is no discernable effect in the display or operation of the page. In such cases, the only thing that "works" might be the ability to validate the file containing the rare and otherwise almost never seen element. The <!DOCTYPE> declaration itself however is not a part of HTML but rather of SGML, the basic language behind HTML. Each version of HTML (except some of the earliest) is described in a special text file called a DTD (Document Type Definition). A DTD is written in a special machine-and-human-readable SGML format which specifies the details of the version of HTML being validated. A couple small parts of the DTD for HTML 2.0 would look like this:

<!--    Modified for use in HTML        
$Id: ISOlat1.sgml,v 1.2 1994/11/30 23:45:12 connolly Exp $ -->
<!ENTITY AElig  CDATA "&#198;" -- capital AE diphthong (ligature) -->
<!ENTITY Aacute CDATA "&#193;" -- capital A, acute accent -->
<!ENTITY Acirc  CDATA "&#194;" -- capital A, circumflex accent -->

<!ENTITY % linkType "NAMES">
<!ENTITY % linkExtraAttributes
        "REL %linkType #IMPLIED
        REV %linkType #IMPLIED
        URN CDATA #IMPLIED
        TITLE CDATA #IMPLIED
        METHODS NAMES #IMPLIED
        ">
<!ELEMENT LINK - O EMPTY>
<!ATTLIST LINK
        HREF CDATA #REQUIRED
        %linkExtraAttributes;
        %SDAPREF; "Linked to : #AttVal (TITLE) (URN) (HREF)>"    >
<!-- <LINK>         Link from this document         -->
<!-- <LINK href="...">  Address of link destination           -->
<!-- <LINK URN="...">   Lasting name of destination             -->
<!-- <LINK REL=...>     Relationship to destination         -->
<!-- <LINK REV=...>     Relationship of destination to this         -->
<!-- <LINK TITLE="..."> Title of destination (advisory)         -->
<!-- <LINK METHODS="..."> Operations allowed (advisory)         -->

One can see from this how the <!DOCTYPE> declaration found in an HTML file is actually an SGML command, one which tells the validator which particular version of HTML is being scanned. Illustrated above are the SGML DTD file entries to specify that the expressions &AElig;, &Aacute; and &Acirc; will result in Æ, Á, and  respectively, and that the <LINK> tag takes the attributes of HREF, URN, REL, REV, TITLE, and METHODS. It also specifies that the HREF attribute is required (must be present in any occurrence of the <LINK> tag). So, for example, the HTML validator would look at this file to know the language details of the particular version of HTML you are validating to, and then scan your file. If it found in an HTML file a <LINK> tag with some other attribute, not on that list, or else a <LINK> tag which is lacking the HREF attribute, it would flag that as an error. So you can see that just as <!DOCTYPE> is used to specify the type of file you have (a bit of SGML in an otherwise HTML file), <!ENTITY> in the SGML DTD file can specify a special character display command (as it does here) or a DTD-internal string definition (as also seen with the linkType declaration above, <!ELEMENT> specifies a tag type, and <!ATTLIST> specifies the list of attributes the particular tag (or "element") takes. Such entities as "&lt;" are themselves actually SGML constructs. Another bit of SGML commonly seen in HTML files is the comment (<!-- The contents of this wouldn't display. -->), and occasionally other SGML structures may turn up as well. Many of such other sorts of SGML-distinct items may not display well in most user agents, as for example demonstrated here.

There does exist a workaround by which some of the more primitive types of files can be validated, and that is by using special document definition files to describe these more ancient versions of HTML. I have made some of these available using the <!DOCTYPE> declarators mentioned below. Note that when the W3C validating engine is used with any of the older DTDs it will issue a warning to the effect that the Document Type is not in the validator's catalog. I have a small note here regarding details of the DTDs that I have prepared or adapted for these unofficial versions of HTML, so that they can be experimented with in detail.

Sources and Structure of Findings

There are three basic sources of information as to the most early versions of HTML. The first and most difficult to use would be the vast store of actual files from the early periods as archived on the Web Consortium's own historic archive files. For this one can only scan a large number of files (or as many as can be found to date from a period of time) to glean the actual nature of the particular variant of HTML in use during said period. The second source would be those few attempts to document HTML as it stood at this or that particular point in time, or else to suggest or propose various extensions or refinements to HTML. The third and most reliable source would be the actual DTD drafts by which various early versions of HTML are precisely defined. There was not at that time any attempt to track or identify any particular "versions" of HTML, and there is no official "HTML 1," but the HTML draft published in mid-1993 is often taken as being a kind of "HTML 1." I draw the line between HTML 0 and 1 on the basis that only with HTML 1 were the DTD's being specifically associated with, and intended to define most formally, what was or was not in the current working draft of HTML. In HTML 0, though there existed some DTD's in the latter part of the HTML 0 period, they existed mostly as research exercises, running only in parallel to the general run of HTML as used and intended. I have assigned continually ascending letters for each of the versions, leaving room for some few additional versions to turn up, with the additional provisions that a through e would be for those which have no DTD but are known through actual files or general descriptive documentation only, and that f and beyond are for the DTD-based versions of HTML, k and beyond for HTML version 1, and that p and beyond for HTML version 1+ ("HTML+").

As a result, I have identified a number of "versions" of HTML, many of which I document here in some detail, and which existed prior to HTML 2. And I believe there were probably more other versions as well as these, which may turn up someday. Since there was no strict versioning mechanism prior to HTML 2, I have arbitrarily assigned the following designations to the various "HTML's" that preceded HTML 2 that I have identified (those in bold are described in detail below; the two surviving versions of HTML+ and the Draft of HTML 3.0 are also given some description here):

  1. HTML 0.a - That most primitive prototype of HTML used by Tim Berners-Lee and a few of his closest associates in "The Project" from the very beginning in October 1990 to mid-January 1991.
  2. HTML 0.b - Unassigned, but it could be seen as a kind of variant of HTML 0.a seen towards the very end of its period, in which <P> briefly goes from being a separator tag to a container tag, but after which it goes back to being a separator tag until the second release of HTML 1, and in which <OL> seems to be at least depreciated.
  3. HTML 0.c - That form of HTML that spans a period from January 23, 1991 through November 23, 1992, and for which documentation was written in its own version of HTML towards the end of that period.
  4. HTML 0.d - That form of HTML that spans a period from November 26, 1992 through May 24, 1993 (or thereabouts), most specifically characterized by the use of the <HEADER> tag to set apart its head.
  5. HTML 0.e - That form of HTML which Dan Connolly described in late November 1992, and wrote some few files in, which frequently contains the <TYPEWRITER> tag and/or a <NEXTID> attribute called ID.
  6. HTML 0.f - The HTML of the oldest surviving DTD from August, 1992.
  7. HTML 0.g - That DTD (now lost) which was pointed to by the November 1992 documentation of HTML; I conjecture that this DTD might have called out the <TYPEWRITER> tag before going to <PRE> as seen in the next version.
  8. HTML 0.h - The HTML of the oldest usable DTD from January, 1993.
  9. HTML 0.i and 0.j - unassigned, room for possible expansion.
  10. HTML 1.k - The first released draft of HTML ("HTML 1") in June, 1993
  11. HTML 1.l - unassigned, might apply to a variant of HTML 1.k that fixes differences from as described in accompanying text, for example adding ALT and ALIGN attributes to the <IMG> tag, or possibly a first introduction of the <META> tag.
  12. HTML 1.m - The second draft of "HTML 1" in May, 1994.
  13. HTML 1.n and 1.o - unassigned, room for possible expansion.
  14. HTML 1.p - Lost early draft of HTML+ from late-mid 1993.
  15. HTML 1.q - Oldest surviving draft of HTML+ in November, 1993.
  16. HTML 1.r - unassigned, room for possible intermediate version of HTML+.
  17. HTML 1.s - Latest surviving draft of HTML+ in April, 1994.
  18. HTML 1.t through 1.z - unassigned, room for possible expansion.

After these comes the official (and sometimes not-so-official) "versions" of HTML, and again those in bold are treated of in detail below:

  1. HTML 2
  2. HTML 3.0
  3. HTML 3.2
  4. HTML 4.0
  5. HTML 4.01
  6. XHTML 1.0
  7. XHTML 1.1
  8. HTML 5.0
  9. XHTML 2.0

In these quick listings of the tags (elements) and attributes of each version of HTML I am using here, I have utilized the following color scheme to represent the various categories of the state of each tag and of each attribute, so as to make comparison between the various versions easier at a glance. These are the colors used:

The algorithm used in calculating the color is thus: If a tag or attribute is being newly introduced with the version being given here (I disregard if any intermediory version, such as HTML+, HTML 3.0, or HTML 4.0, may have introduced it), it will be either Green or Chartreuse, depending on the next question. If a tag or attribute is any of a) listed or described in the W3C document version as being "depreciated" (or even "obsolete" as seen in some earlier texts), but still present in the language, or b) not present in the "strict" form of the language, or c) omitted from the next version of the language, or d) confined to a "Legacy Module" (in XML versions), or any combination of the above, it will be Red or Chartreuse, depending on the previous question. If a tag or attribute, present in any previous version of HTML, is missing from the present version, it is Faded Blue. If none of the above conditions apply, it is Black.

A Technical history of HTML

Described below are the versions of HTML which I have identified and discussed in detail. The earliest three versions that I have identified have been based upon the archive store of ancient HTML files found in the historical archives of the World Wide Web Consortium (W3C) website, looking in detail at the nature of the HTML contained in the files and tracing the history sequence by using the date modified information contained in the HTTP header information for that file, as archived by the W3C. These ancient files contain a vast repository of the earliest efforts of Tim Berners-Lee, Dan Connolly, and others. By observing these files, I have attempted to construct what I call "simulated" HTML DTDs for each of them. As it turns out, there are two problems with the earliest two of the versions that render it impossible to validate them by SGML. For reference and ease of distinction I have given these first three attempted versions the names of HTML0.a, HTML0.c, and HTML0.d (only the DTD for this last one can work).

In addition, I have found six DTDs older than the HTML2 DTD. The oldest surviving DTD I have found is from August 20, 1992 and it was written by Dan Connolly. This early DTD is far too incomplete to regard as a working version of HTML, though it did point to some future directions HTML would soon take. For reference I call it HTML0.f. The next DTD to survive was also written by Dan Connolly, on January 20, 1993. For reference I call it HTML0.h. Then next came the first released draft for an official HTML 1 DTD published (on the web) on July 01, 1993 (which I call HTML1.k) and a second release of HTML 1 which occurred on May 18, 1994 (which I call HTML1.m). Finally, there are two HTML+ DTDs, one from 1993 (but last modified in April 1994, and the other from about April 1994 (but last modified in 2000), but these DTDs, written by Dave Raggett, follow a very different approach (more akin to that of HTML 3.0 than HTML 2) and cannot be logically fitted into a sequential position, and introduce much that is not seen in any other version of HTML.

There are reports of DTDs being in work by various people as early as May 1992, and Dan Connolly's August DTD mentions a first pass at making a DTD dated 15 July, and the November 1992 descriptions and discussions of HTML pointed to a DTD which was current at that time (but is now lost). The pointer is still active, but points to a file last modified in late 1995 and contains what is essentially a penultimate draft of HTML 2. I doubt however that it ever exactly matched the description of HTML given at that time, and this seems to be a continuous problem in the documentation of HTML. Usually due to proofreading or typographical errors in the DTD the description of the version of HTML given in the official W3C documents contains some few differences from that actually valid per the DTD. Except for the first three versions, for which no official DTDs exist, in the tag and attribute lists given here, it is the DTD and not the narrative documentation which has been followed as the source of what actual tags and attributes actually exist in the level of HTML. Such differences will be noted as they arise.

As the HTML languages get more advanced, the number of tags and attributes expands tremendously, so what I propose to do here is list them in "clusters." Many tags belong in some specific category of tag, or else are used only with a small group of related tags. As we transition to more and more advanced versions of HTML we will add clusters and also add to the existing clusters already introduced in the older versions.

"HTML 0.a" A Prototype HTML, The Very Beginning

The very oldest known surviving HTML file was last modified on Tue, 13 Nov 1990 15:17:00 GMT. Its complete text reads exactly as follows:

<title>Hypertext Links</title>
<h1>Links and Anchors</h1>
A link is the connection between one piece of
<a href=WhatIs.html>hypertext</a> and another.

This can be found here. Following its link brings one to many other pre-HTML 2 files, many of which, dating from before 1993, can also be found here. These very oldest HTML files from the first four months or so of HTML all bear a number of distinct characteristics that make them detectible as further examples of the very earliest form that HTML took. This earliest form of HTML was never documented but it can be seen and gleaned from the following files that were programmed in it and which all stem from the earliest period for HTML. They can be distinguished by the use of lower case for the tags (similar to XHTML) no <nextid> tag, and overall very primitive use of only a very small group of the oldest and most original HTML tags. It also appears to have been hand-entered, since several typographical errors have been detected in the HTML tags. Occasionally, such hand-entered HTML would surface again, distinguished by the same characteristics, sometimes even in files generated by the NeXT HTML generator, in what places they have been manually retouched.

It appears that Tim Berners-Lee only wanted to spend a very few days extracting only the most basic and rudimentary functions from CERN's SGMLguid language for this small prototype markup language, so that he could instead devote his time to the far larger and more complex issues of creating the kind of network and servers and overall infrastructure that would make his "link" idea work. As such this first crude pass at HTML was aimed only at providing a basic framework of document formatting/structuring commands within which to position his true brainchild, the <a href=Filename.html> link. Obviously he intended from the start that this markup language would soon be expanded upon by both himself and others, and things did indeed unfold just that way. The following file showcases the look and source of all such files that have been identified as having come from this earliest prototype HTML period: Source and Contents of the oldest surviving HTML Files. Basically, most of the remaining such files are documentation (papers, requests for papers, announcements, trip reports, tutorials) pertaining to the European Conference on Hypertext, 1990 (ECHT90) which took place on 27-30 November 1990, and these files were last updated December 1990 and the first half of January 1991, with the exception of the last four which, though also dating from this period, recieved minor updates, one in late 1991 and the other three in mid-1992.

Here is the set of tags and attributes for HTML as it had existed during its first several months, from its beginnings in October 1990 until mid-January 1991 when the next style appears:

Document Head Tags
Block Level Body Tags
Raw Text Tags
Hyper Link Tags
Listing Tags
Dictionary Tags
Inline Formatting Tags

Even this list, very short as it is, may be somewhat artificially expanded with the inclusion of <xmp> and <listing> and the dictionary tags which, though seen in a test file from the period, do not otherwise seem to have been used in any other files from this period and which may not have been implemented. Clearly, the inclusion of these otherwise unseen tags within the one test file signaled an intention that these would one day exist (and the file does show them), if not at first, then at least eventually. Though the last two of the sample files also include the dictionary tags, both of them have been modified on August 26, 1992, quite possibly to add these dictionary tags, and in one of which these tags (<DL>, <DT>, and <DD>) appear in upper case while all other tags in the file are in lower case, showing that they were indeed entered by a later hand some time after the original creation of the file. The <ADDRESS> tag is seen only in one file modified on June 11, 1992 and not included in the test file nor otherwise mentioned, so I opine that that tag is not native to this early period.

The difficulties with producing an SGML DTD for this initial period are that it did not need (perhaps did not even take, as there are no counterexamples) any quotation marks to surround the attribute values. So by SGML standards, any href or name value containing either a "/" or a "#" or a "@" would cause an error with parsing-checking engines. Furthermore, the <p> tag appears to have been variously used, at first being used as a paragraph separator and empty element (much like <BR> would later come to be, but producing a visible gap to separate the paragraphs), but then towards the end of this period as a container, designed to have text "contained" between an opening tag and a closing tag. Some </p> closing tags have been spotted in these most ancient HTML files. Starting with the next version of HTML the <P> tag occurs strictly as a separator tag clear until the second draft of HTML 1.

All the tags and attributes from this era, except for the provisional and largely unimplemented <hpn> tags, have proven to be of lasting value, although one of them would quickly have the distinction of being the first useful tag to go away. <ol> is commonly spotted in this period, but was deleted for the next version of HTML, and only appears to have been reintroduced with the DTDs by Dan Connolly, and finally got accepted back in by Tim Berners-Lee for the DTD for HTML 1.k. The <ol> tag has been a useful workhorse tag since its reintroduction, and remains current to this day along with all other tags from this initial period. Though the <p> and <li> tags would continue to modern times, their usage has changed a bit. While in this earliest period they served as separator tags (except for <p> which briefly served as a container tag) with the next version, and clear into HTML 1 they (along with the dictionary tags <dt> and <dd>) began to serve exclusively as separator tags, only to be all changed to container tags for the second draft of HTML 1, and remain thus to modern times. At this point, the <title> tag was clearly considered optional, given that so many of these early files omit it.

I have constructed a DTD for this file type and it can be used for validating files by affixing the following declaration to a file and submitting it to the W3C validating engine:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html0.a.dtd">
  1. Working XMP and LISTING Tag Element Examples
  2. Working Tag Element examples of HP0, HP1, HP2, and HP3

"HTML 0.c" The Earliest Documented HTML

One of the very oldest files that shows signs of being produced by the NeXT HTML editor is this one dating from 23 January, and this file is the oldest surviving use of the <NEXTID> tag. There is also a file near the end of January 1991 which seems to be from this era since all tags are in upper case, but it lacks a <NEXTID> perhaps merely because there are no name links in the file so NEXTID would have been set to zero, the same as not having a NEXTID value to the file. This oldest surviving description of HTML was mostly written in the version of HTML it describes, but it is still helpful to see a random sampling of the surviving files written in this version of HTML, as shown in Source and Contents of a sampling of the earliest documented HTML. Even that description is not totally precise in that it lists one tag which was as yet not in evidence and probably did not exist, namely the <BASE> tag, and one attribute, the TYPE attribute of <A> which also is not seen, anywhere. Here is the set of tags and attributes for HTML as it developed and was used over the course of most of 1991 and 1992:

Document Level Tags
Document Head Tags
Block Level Body Tags
Raw Text Tags
Hyper Link Tags
User Input Tags
Listing Tags
Dictionary Tags
Inline Formatting Tags

This short list is also artificially lengthened by several placeholder tags listed there more as room for expansion rather than realized functions. The <ISINDEX>, <DIR>, and <MENU> tags were described as existing, but appear to have arisen quite late in this period, and no examples from the period of these tags have been found. At this point in time, <NEXTID> did not take an attribute proper, but simply a number itself (e. g. <NEXTID 3>). Conspicuously absent at this point were all document level tags other than <PLAINTEXT>, which is another reason why HTML had to make those tags (<HTML>, <HEAD>, and <BODY>, and their closing tags) all optional. Though they did not as yet exist in this version, they too were among those proposed and being discussed.

I am sure that this "version" of HTML did not all burst upon the scene full and complete in January 1991. Some tags, such as <NEXTID>, indeed go to the very beginning of this period, while others such as <DIR> and <MENU> obviously arrived later on. The dictionary tags (<DL>, <DT>, and <DD>) appear to to have been implemented by the end of January 1991, while the document level tag <PLAINTEXT> and the the block-level raw text tag <LISTING> are reported as having been implemented in at least some browsers by as early as February and March of 1991, though no examples of either of these (apart from demonstration files) are found until months later on. Due to the fact that the <ISINDEX> tag was only to be generated by a program runnning on the server, and such executable programs were seldom preserved and doubtless would not even run on contemporary machines anyway, there is no way to tell what use if any it had during this period, or when it was introduced. During the course of this period, the <TITLE> tag would rapidly transition from being optional to being required for an HTML document. It is not clear that the <HPn> tags were ever actually implemented as no example of them have been found apart from the test file from the previous period. A direct equivalent of that test file, but stemming from this period, omits the <HPn> example instances, together with an omission of the <OL> element contained in its older form from the previous period.

This phase of HTML represents a clear distancing from SGML and the CERN SGMLguid language as anything but a potential source of some possible tag names and meanings (e. g. <Hn>). Such a period does seem to precede a certain academic discipline being applied to HTML, almost more of a "Joe Sixpack" stream of consciousness mode of programming, rather than any rigorous academic structure. In conceiving this version of HTML (mostly back in January, 1991), it seems to have been uncertain as to what the basis for any academic discipline would be applied to HTML in order to define it as a language. Instead, just any old "good idea" that happened to come along would be readily added to the language as a new and useful feature. Later on, a distinction would gradually form between those structures and elements and attributes coming from the industry versus those coming from the academic community, but at this point everything was welcome.

So, when this version was developed, SGML and SGMLguid had pretty much become merely one more source among many for some "ideas" or "influences" from which to draw some useful features. The <NEXTID> tag in particular, as seen and also as written up in the documentation at that time, merely took a number instead of a formal SGML-type attribute. This is bad SGML and there is no good way to build a DTD to define this version of HTML on account of this one thing alone. (For the "html0.c" I generated with which to validate this kind of file, this <NEXTID> feature was approximated by making the attribute an enumerated list of possible explicit values, which only go up to 127, so any value higher than that will not validate. However, no files from this period appear to have ever gotten to so high a value for this to be a problem.) This is also the phase in which another SGML-hostile tag would be introduced, namely <PLAINTEXT>.

I have constructed a DTD for this file type and it can be used for validating files by affixing the following declaration to a file and submitting it to the W3C validating engine:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html0.c.dtd">
  1. Working NEXTID Tag Element Example
  2. Working PLAINTEXT Tag Element Example
  3. Working ISINDEX Tag Element Example
  4. Working DIR and MENU Tag Element Examples

"HTML 0.d" Making the NeXT Editor SGML Compliant

The W3C historical archives contain no NeXT-HTML-Editor-generated files dating from either November 24 or 25 1992. Perhaps Mr. Berners-Lee spent this time installing and shaking the last bugs out of a significantly new version of the NeXT editor which implemented the first rigorous attempt to comply with SGML as the formal basis of HTML as a language. All files in the archive dating from November 23 of that year and earlier are as described in the above paragraphs here, but files dating from November 26 until at least as late as May 24, 1993 bear the characteristics of this next version of the NeXT editor, except where obviously generated earlier and then hand-edited during that time, or else generated by someone else who had not as yet loaded the new version of the NeXT HTML Editor.

By May 1992 it had been pretty much decided that SGML would provide the academic basis for this new computing invention called HTML. Nearly all structures in the language could be expressed and defined in an SGML DTD, leaving only a few details which could not be defined in SGML and so which had to be changed. Also, a number of new ideas were being thought of, and drafts of the DTD easily incorporated many of these ideas. So important were these ideas that a couple of them were actually incorporated into the description of HTML as last modified on November 13, 1992. There was the <BASE> element which would not be implemented or even defined in a DTD until HTML 1. And there was also the TYPE attribute of <A> which had no specific list of possible values, only some general discussions as to what sorts of possible values might later on appear. It too was only a placeholder within the documented description of HTML.

There is no formal documentation for this next phase of HTML, only a number of files saved as examples of it in the W3C archives, and a few files that document the Discussions for Future Directions for HTML. In the course of these discussions going on at that time, there were vague proposals which would later lead to the introduction of the <LINK> and <PRE> tags, and there was talk of restoring the ordered list tag (<OL>). These discussions even included the possibility of a <DATE> tag which might even feature an EXPIRES attribute, and a <KEYWORDS>, tag to assist automated searches, which together gradually matured and finally emerged in HTML 2 as the far more versatile and useful <META> tag. The tags observed for this period (or reasonably inferred) are as follows:

Comments
Document Level Tags
Document Head Tags
Block Level Body Tags
Raw Text Tags
Hyper Link Tags
User Input Tags
Listing Tags
Dictionary Tags
Inline Formatting Tags

Unlike the previous versions, a DTD can be constructed to validate files of this interim standard in accordance with SGML principles. In particular, the <NEXTID> tag recieved its attribute N, making it at last SGML compliant, the <PRE> tag had been introduced, and got considerably more use in this period than the previous non-SGML tags it replaced (though I am sure that the non-SGML tags were all still available). Also at this time appears the beginning of a formal document structure, by separating the HTML document in to a head (<HEADER>) and body (<BODY>), and confining <TITLE>, <NEXTID>, and <ISINDEX> to the head and the rest to the body. Note however the different name for the head tag than would appear in all future versions of HTML. I have constructed a DTD for this file type and it can be used for validating files by affixing the following declaration to a file and submitting it to the W3C validating engine:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html0.d.dtd">

The only "Lost tag" to be introduced during this phase is the <HEADER> tag, which is exactly replaced with its lasting successor <HEAD> in all future versions. This could almost border on the level of a proprietary tag, since it seems to have belonged to the NeXT editor alone, but since it is the earliest, and since so many W3C archive files from this period bear this unusual tag, I have included it in my successive listings. It and the <HPn> tags are the only tags I list here whose disappearence was so long ago that they show up on no surviving DTD. These tags therefore have the distinction of being the very first truly Lost Tags. Though <OL> disappeared earlier, its subsequent return, complete with even the same form fit and function, removes it from the category of Lost Tags.

  1. The First Lost Attribute TYPE of A and LINK
  2. A Look at the Experimental Tags/Elements and Attributes

"HTML 0.h" The Oldest Surviving Usable DTD

By 1993, it was pretty much accepted that SGML would indeed be the basis for the new hypertext document format language, HTML, and that an SGML DTD would be the official lexical definition of HTML documents. Though some several DTDs were written in 1992, only a rather incomplete and preliminary one of them has survived. After that, the oldest surviving DTD was written by Dan Connolly and dated January 20, 1993. Contained in this DTD are the two earliest attempts to insert stylistic formatting instructions into the HTML, namely the WIDTH attribute of <PRE> and a STYLE attribute (never to be seen again in this form) of the <DL> tag which would soon morph into the COMPACT attribute of all listing tags, <UL>, <OL>, <DIR>, <MENU>, and <DL>.

Comments
Document Level Tags
Document Head Tags
Block Level Body Tags
Raw Text Tags
Hyper Link Tags
User Input Tags
Listing Tags
Dictionary Tags
Inline Data Type Tags
Inline Formatting Tags

This earliest surviving usable DTD seems consistent with what one would expect as the logical progression from the state of things in November 1992 when the HTML as it had been for nearly two years was documented and also the first SGML-compliant version of HTML to be generated by NeXT also appeared. Not only does this replace <HEADER> with <HEAD> and return <OL> to the language, but more importantly it adds so many basic inline text types which would prove of lasting value, as a far superior replacement of the old <HPn> tags, such that they are no longer found (if indeed they ever occurred in any DTD). Even so, three of these tags would disappear by the second release of an "HTML 1" draft, <KEY>, <DFN>, and <U>, but the latter two would return in HTML 3. It is in this phase that the WIDTH attribute of <PRE> would be introduced as the first temporarily surviving presentational feature, though a companion presentational feature STYLE of the <DL> tag would soon mature into the equally temporarily surviving <COMPACT> attribute of the listing tags, both of which were only implemented in some very few early browsers. Drawing from the academic and more theoretical minds, this is the point at which the URN and METHODS attributes of <A> and <LINK> would be introduced, only to disappear promptly after HTML 2, but also the useful and surviving TITLE attribute of the same tags. <LINK> is finally introduced here, and in this initial form given simply the exact same attributes as <A>. By HTML 2, the NAME attribute, being obviously useless and nonsensical on this tag, would be quietly deleted from it.

I have adapted this DTD to a format usable with the W3C validating engine and it can be invoked by affixing the following declaration to a file:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html0.h.dtd">
  1. Working KEY Tag Element Example
  2. The Lost Attribute STYLE of DL
  3. The Lost Attribute SOURCE of BLOCKQUOTE
  4. The Lost Attributes URN and METHODS of A and LINK
  5. The Lost Attribute WIDTH of PRE

"HTML 1.k" A First Draft Published for HTML 1

There is no official "HTML 1," but if anything could be properly regarded as at least a semi-official first published draft for an HTML 1, this would have to be it. This first published version of HTML was published in the middle of 1993. For this version of HTML, its reliance upon the previous DTD is obvious, but now some more useful things have been added. In particular, <BASE> at last makes its introduction, an <IMG> tag (imported directly from Mosaic) is also introduced, the TYPE attribute of <A> and <LINK> is at last replaced with the pair of equal but opposite attributes REL and REV, and the STYLE attribute of <DL> finally matures into the COMPACT attribute, but by what appears to be a mistake the COMPACT attribute was omitted from <DL> even as it was added to all of the other listing tags, <UL>, <OL>, <DIR>, and <MENU>. One sees this both from the way STYLE was illustrated in the previous DTD with a value of COMPACT and the way COMPACT is described in this version's DTD, and also from the attached narrative description of this version of HTML. By a similar mistake, the second release of HTML 1 nearly a year later will finally affix COMPACT to the <DL> tag, but then deprive the other listing tags of this attribute.

In fact, there are a number of differences between the HTML 1 described in the narrative documentation versus the DTD presented in the latter portion of the same file. Besides describing COMPACT as an attribute of <DL> it mentions two attributes ALT and ALIGN of <IMG> that the DTD does not support. The narrative would limit <PRE> to taking only certain inline tags where the DTD defines it as accepting all inline tags, including the <KEY> tag which goes altogether unmentioned in the narrative. On the other hand, the narrative mentions the <HPn> tags in the same manner as it mentions the <XMP>, <LISTING>, and <PLAINTEXT> tags, namely as obsolete, without hinting that only the latter are included in the DTD (though sequestered in a different area with a comment to the effect that they are obsolete). Finally, the narrative is quite explicit in stating that the N attribute of <NEXTID> cannot take any letters but must be a number (as indeed was the case in the previous DTD and as observed in all previous versions), blissfully unaware that the DTD text itself expressly permits N to be a "NAME" such as Z67.

So which is the "real" first HTML 1 draft? The way I see it, the DTD is the final arbiter as to what is or is not part of the language. So here is the HTML 1 (first draft) set of tags from the DTD, as published in June 1993:

Comments
Document Level Tags
Document Head Tags
Block Level Body Tags
Raw Text Tags
Hyper Link Tags
User Input Tags
Listing Tags
Dictionary Tags
Inline Data Type Tags
Inline Formatting Tags
Inserted Object Tags

I have adapted this DTD to a format usable with the W3C validating engine and it can be invoked by affixing the following declaration to a file:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html1.k.dtd">
  1. The Lost attribute COMPACT of UL, OL, DL, DIR, and MENU

"HTML 1.m" A More Advanced HTML 1 Version

By May 18, 1994, a more advanced version of the HTML 1 DTD was posted on the web, which as seen here, and which also introduces the two level concept seen in HTML 2 where the first level lacks forms and the second has forms. In this form the <P> tag was also transitioning from a separator tag to being a container tag. Indeed, depending upon options selected, it could go either way, with the default being that they are container tags. The other interior listing tags (<LI>, <DT>, and <DD>) however all unconditionally become container tags where they had been separator tags from their beginnings up until this point. For backward compatibility, the closing tags on these listing tags are all optional. The <KEY> tag is going away as something no longer wanted, allowable, but by default not allowable, but at least it was recognized as something more than merely a typographical error for <KBD>. Unlike HTML 2 which only has two selectible options of yes or no each (resulting in four flavors of HTML 2, this release of HTML 1 has three such yes/no selectible options, resulting in 8 different combinations. To get any combination but the basic default however requires additional SGML commands to an HTML <!DOCTYPE> declaration, any of which will result in putting an unwanted "]]>" showing at the top of the file. The three optional flags are:

In addition, the <KEY> and <U> tags can also be individually enabled, though the default is to exclude them, and the <NEXTID> can be individually disabled, though the default is to include it. The documentation states that the <IMG> element takes "two attributes," lists three attributes (SRC, ALIGN, and ALT), and the DTD lists all three of these plus a new ISMAP attribute (copied from HTML+) that for the first time in HTML enabled image maps. Also introduced are the <BR> and <HR> elements. In addition, a new <STRIKE> element is introduced into the DTD but nothing is said of this in the documentation. More can be read about this version of HTML (this one is otherwise pretty well documented) at This more complete HTML Version.

Comments
Document Level Tags
Document Head Tags
Block Level Body Tags
Raw Text Tags
Hyper Link Tags
User Input Tags
Listing Tags
Dictionary Tags
Inline Data Type Tags
Inline Formatting Tags
Inserted Object Tags

I have adapted this DTD to a format usable with the W3C validating engine and it can be invoked by affixing the following declaration to a file:

<!DOCTYPE HTML SYSTEM "http://www.the-pope.com/html1.m.dtd">

It was during this period of HTML that the SGML declaration began to be used as a way to invoke the IETF and Web Consortium HTML validator. Files from this time (and no older) sometimes contain the following:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">

This declaration, once introduced, pointed to the current working version of the HTML DTD, whatever it was, and with whatever changes were being quietly introduced in the definition. With the introduction of HTML 2.0 this began to point to the most default flavor of HTML 2.0, namely that which is level 2 (permits forms) and not strict (also permits the depriciated elements). This DTD declaration is still valid to this day and still points to this particular flavor of HTML 2.0.

HTML+: An Early Attempt to Encompass Everything

Dave Raggett took a different view of writing DTDs than Dan Connolly or Tim Berners-Lee, who both seemed to take a rather minimalist approach. Dave's approach was to make the DTD into some sort of "superset" of all the known tags and attributes currently found out there, whether from propritary browsers or academia or wherever. So he made an attempt to explore far ahead of the official draft for HTML 2 already in progress, which he called HTML+, as a hint of what was to come later, also published late 1993. In April of 1994, an updated version of HTML+ was published. Detail comparisons between the surviving versions of HTML+ and the Draft of HTML 3.0 (somewhat similar to the comparison listings in this file) can be seen here.

Dave Raggett's approach would resurface in the original drafts of HTML 3.0 (which were never quite approved), and much of his influence also shows in the final 3.2 version of HTML despite some rather draconion cuts made to it. The problem with it was that the industry was coming up with a whole host of presentational features (stylistic commands) where the more academically disciplined approach of Connolly and Berners-Lee was that such things should be relegated to some sort of "style sheets," though that concept had not been so much as prototyped as yet. The industry, impatient for the development of style sheets, went ahead and introduced its own tags and attributes at will, and Raggett's approach simply captured that, accidently also serving as a kind of endorsement of them all that Connolly and Berners-Lee did not wish to have given.

HTML 2.0: The Oldest Recognized W3C Version

HTML 2.0 is documented by the Web Consortium in the following document: Hypertext Markup Language - 2.0, dated September 22, 1995. It represented the first widely usable form of HTML, and the oldest which the W3C validating engines can validate without warnings. As it stands, this language comes in four basic flavors, two varieties each of two levels. The first level is the more stripped-down HTML format, but in fact differing from the second level only in that it tolerates no FORMS tags or attributes. Each of these levels has a regular and "strict" form, and the strict form prohibits the tags that were already depreciated in HTML 1 as "obsolete," namely <NEXTID>, <PLAINTEXT>, <XMP>, and <LISTING>. Apart from those differences, the varieties of HTML 2.0 scarcely differ.

HTML 2 added relatively little, namely only the <META> tag, and an attribute VERSION to the <HTML> tag, but it also corrected the various problems with the COMPACT attribute, finally assigning it to all the list tags equally, and also fixed a long-running bug with <PLAINTEXT> by making it a container tag, with the closing tag optional (i. e. meant to be left off). The <LINK> element at last has the useless NAME attribute removed from it as one other bit of clean up, but the <DFN> element was eliminated without explanation. The big difference that HTML 2 introduced was the use of a strict versioning system (indicated with the new VERSION attribute) that would be used for all versions of HTML from this point onwards. For the first time the various flavors of the current version of HTML can each be selected without using clumsy SGML commands. However, the correct place for the version information has proved to be in the <!DOCTYPE> declaration, not the VERSION attribute, so the new attribute was depreciated in HTML 4.0 and 4.01 and elminated altogether in XHTML 1.0 (but mysteriously resurrected in XHTML1.1!). This new attribute is not mentioned anywhere in the documentation for HTML 2.0, but can only be gleaned from inspecting the DTD in detail. In the example files called from this file (where acceptible) I have populated this attribute with its correct value for the file in question.

In HTML 2.0, I identify the following clusters:

Comments
Document Level Tags
Document Head Tags
Block Level Body Tags
Raw Text Tags
Hyper Link Tags
User Input Tags
Listing Tags
Dictionary Tags
Inline Data Type Tags
Inline Formatting Tags
Inserted Object Tags

As one can see from the color coding of the above, only a very few of the specific items codified in HTML 2 were either listed as depreciated or quietly slated for deletion, such that most of them could not be used in the strict versions of HTML 2, and others would not be seen in any future version of HTML.

  1. The Introduction of the ALIGN Attribute of IMG and INPUT
  2. The Extraordinary Flexibility of META and LINK
  3. The HTML VERSION Attribute (distinct from the <!DOCTYPE> declaration).

HTML 2.0 can be validated with any of the following:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Level 1//EN">
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict Level 1//EN">
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN">

It should be obvious which of these is for level 1 or level 2 and which are for strict or non-strict.

HTML 3.2: A Much Richer Standard, and a Road Not Taken

HTML 3.2 is documented by the Web Consortium in the following document: HTML 3.2 Reference Specification, dated January 14, 1997. This next phase of the development of HTML represents an interesting side path taken as a result of industry pressure to which the IETF and W3C barely consented (in fact, IETF bowed out of any further HTML considerations before any version of HTML past 2.0 could be published. The industry had found it expedient to hang all manner of presentational bells and whistles off the existing HTML tags, and many of them had already gained considerable industry-wide support, even from competing Browser and other vendors. In what is almost a capitulation to the fait accompli of all these presentational bells and whistles, the W3C most grudgingly, and with much delay, finally released their 3.2 standard. There had been a 3.0 standard being proposed, but it was never published as anything more than a draft, and it featured only all the more such vendor-specific bells and whistles.

In HTML 3.2: I identify the following clusters:

Comments
Document Level Tags
Document Head Tags
Block Level Body Tags
Raw Text Tags
Hyper Link Tags
User Input Tags
Listing Tags
Dictionary Tags
Inline Data Type Tags
Inline Formatting Tags
Inserted Object Tags
Table Tags

So one can see the addition of many such universally accepted extensions to HTML, some of which, such as MAP or TABLE, would endure, but many of which would be soon shifted over to style sheets, once that technology could be itself settled upon. As it is, the implementation of Style sheets as yet still fails to be anywhere near as consistent among user agents as the implementations of many of the above presentational attributes and tags. In addition, many proprietary extensions were supported only by one vendor, and competing extensions supported by the competing vendor, that many commonly used tags at the time were not approved within any W3C standard, though a few such would finally make it into HTML 4.

  1. Working APPLET Tag Element Example
  2. Working TYPE, START, and VALUE Listing Attribute Examples
  3. Working BGCOLOR and BACKGROUND Attributes of BODY, TABLE, TR, TH, and TD Tag Element Examples
  4. Working FONT, BASEFONT, U, and STRIKE Tag Element Examples
  5. Working ALIGN, NOSHADE, SIZE, and WIDTH Attributes of HR Tag Element Examples
  6. The Addition of the ALIGN Attribute of many Tags/Elements, and the CENTER Tag Element Examples
  7. Working CLEAR Attribute of BR Tag Element Examples
  8. The use of STYLE and SCRIPT in HTML 3.2
  9. Working NOWRAP Attribute of TH and TD Tag Element Examples
  10. Remaining Sizing and Spacing Attributes of HTML 3.2
  11. Netscape Proprietary Extensions to HTML: BLINK
  12. Netscape Proprietary Extensions to HTML: LAYER
  13. Netscape Proprietary Extensions to HTML: ILAYER
  14. Netscape Proprietary Extensions to HTML: Javascript Stylesheets
  15. Microsoft Proprietary Extensions to HTML: MARQUEE
  16. Microsoft Proprietary Extensions to HTML: COMMENT
  17. Microsoft Proprietary Extensions to HTML: EMBED
  18. Microsoft Proprietary Extensions to HTML: BGSOUND
  19. Microsoft Proprietary Extensions to HTML: DYNSRC, CONTROLS, START, and LOOP Attributes of IMG
  20. Microsoft Proprietary Extensions to HTML: VBScript

HTML 3.2 can be validated with the following:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

HTML 4.01: A Mature and Stable Markup Language

HTML 4.01 is documented by the Web Consortium in the following document: HTML 4.01 Specification, dated December 24, 1999. In HTML 4.01, many of the presentational features introduced in HTML 3.2 are depreciated, generally in favor of style sheets, but occasionally scripting languages as well. Nevertheless, several such proprietary extensions known at the time of HTML 3.2 were introduced in HTML 4.01 as depreciated elements, just so they can be used properly, if used at all. HTML 4.01 Transitional is the most broad expression of HTML, excluding only a small handful of the very oldest tags, and at the new end omitting only the XHTML-exclusive Ruby text feature. HTML 4.01 is preceded by an almost identical version (only a few minor typographical errors had been fixed) known as HTML 4.0. HTML 4.0 (and 4.01) introduced a new <IFRAME> element which is not spoken of as depreciated and yet the Strict forms of this version of HTML do not admit this element. Perhaps this was accidently done because the newly introduced <FRAMESET> element, and other framing elements and attributes, were intentionally depreciated (not included in the Strict versions), and <IFRAME> just got picked up along the way due to the similarity of its name. HTML 4.01 possesses several small groups of attributes which it hangs off of nearly every different tag. To save display space, I am compacting the events category of attribute under the one heading %EVENTS to replace the lengthy list of onClick, onDblClick, onMouseDown, onMouseUp, onMouseOver, onMouseMove, onMouseOut, onKeyPress, onKeyDown, and onKeyUp. In HTML 4.01, I identify the following clusters:

Comments
Document Level Tags
Document Head Tags
Block Level Body Tags
Raw Text Tags
Hyper Link Tags
User Input Tags
Listing Tags
Dictionary Tags
Document Update Tags
Inline Data Type Tags
Inline Formatting Tags
Inserted Object Tags
Table Tags
Frames Tags
  1. Attribute PROFILE of HEAD
  2. Handling Different Languages in HTML
  3. Working DATASRC, DATAFLD, DATAFORMATAS, and DATAPAGESIZE Attributes of TABLE and other Tags Elements Examples
  4. ISMAP Attribute of Forms INPUT Tag Element
  5. Remaining Sizing, Spacing, and Coloring Attributes of HTML 4.0 and 4.01

HTML 4.0 and 4.01 can be validated with the following:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">

XHTML 1.0: Changing the Basis from SGML to XML

XHTML 1.0 is documented by the Web Consortium in the following document: XHTML™ 1.0 The Extensible HyperText Markup Language, dated August 01, 2002. XHTML 1.0 possesses nearly the same exact list of tags and attributes as HTML 4.01, since it is little more than a conversion of HTML 4.01 into an XML-based language. Nevertheless, it does introduce some changes from HTML, most notably the introduction of the xml:lang, xmlns, and xml:space attributes, and the complete elimination of the version attribute. It also made the use of lower case tags and attributes mandatory, which I show here as listing all tags and attributes (even those expired) as lower case. Before they were case-insensitive, but upper case was most typical so as to make the tags stand out from the text as much as possible. As I did above with HTML 4.01, I am compacting the events category of attribute under the one heading %events to replace the lengthy list of onclick, ondblclick, onmousedown, onmouseup, onmouseover, onmousemove, onmouseout, onkeypress, onkeydown, and onkeyup. In XHTML 1.0, I identify the following clusters:

Comments
Document Level Tags
Document Head Tags
Block Level Body Tags
Raw Text Tags
Hyper Link Tags
User Input Tags
Listing Tags
Dictionary Tags
Document Update Tags
Inline Data Type Tags
Inline Formatting Tags
Inserted Object Tags
Table Tags
Frames Tags

XHTML 1.0 can be validated with the following:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

XHTML 1.1: Leaving the Past Behind

XHTML 1.1 is documented by the Web Consortium in the following document: XHTML™ 1.1 - Module-based XHTML, a working draft dated February 16, 2007. XHTML 1.1 possesses only the same list as XHTML 1.0 Strict, plus it depreciates yet some more items and adds a few items. By far, the most notable addition is the new Ruby text feature, which allows a small-type text to be affixed to regular-sized text. As I did above with HTML 4.01, I am compacting the events category of attribute under the one heading %events to replace the lengthy list of onclick, ondblclick, onmousedown, onmouseup, onmouseover, onmousemove, onmouseout, onkeypress, onkeydown, and onkeyup. In XHTML 1.1, I identify the following clusters (to be consistent with my previous listings; these only loosely coincide with the various XML modules selected for "normal" XHTML 1.1, but there are some minor differences):

NOTE: The following listing has not yet been validated!

Comments
Document Level Tags
Document Head Tags
Block Level Body Tags
Raw Text Tags
Hyper Link Tags
User Input Tags
Listing Tags
Dictionary Tags
Document Update Tags
Inline Data Type Tags
Inline Formatting Tags
Inserted Object Tags
Table Tags
Frames Tags
  1. Working rubytxt Tag Element Examples

XHTML 1.1 can be validated with the following:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

HTML 5.0 and XHTML 2.0: What's to Come

XHTML 2.0 is documented by the Web Consortium in the following document: XHTML™ 2.0, a working draft dated July 26, 2007. However, protest from the Joe Sixpack programmers (and others) is pressuring the Web Consortium to return to advancing regular SGML-based HTML, at least as a parallel effort to their development of XHTML. This is not only driven by the relative ease by which HTML can be programmed as opposed to the severe strictness of XHTML, but also by the "spagetti-dtd" nature of XHTML 1.1's document definition. Suddenly there is no longer simply one big file that does it all, but instead quite a swelter of little files that are next to impossible to get through so as to ascertain the actual contents of the language. So, in response to the call to resume development of HTML, a sort of draft or forum for HTML 5.0 is made available here As one can see from the working drafts, there is much about HTML 5.0 and XHTML 2.0 which are still in a state of flux, and much that remains to be written as yet. Furthermore, it is also impossible to validate a document purporting to be an XHTML 2.0 document at this time, and only some of the more trivial HTML 5.0 documents can be correctly validated, since the language itself is still under development and furthermore basing itself on neither XML nor SGML, so only a kind of state-machine validator is available for beta testing.

This file, "lostHTML.html," is XHTML 1.0 Strict compliant.
This file also validates without error as CSS level 2.1.
The XHTML 1.1 counterpart to this file, lostHTML.htm is XHTML 1.1 compliant.
The Source and Contents of the oldest surviving HTML Files file "htmlsource0.a.html" is HTML 4.01 Strict compliant.
The Source and Contents of a sampling of the earliest documented HTML file "htmlsource0.c.html" is HTML 4.01 Strict compliant.
The Discussions for Future Directions for HTML file "htmldirections.html" is HTML 4.01 Strict compliant.
The Source and Contents of a sampling of the earliest SGML-compliant HTML file "htmlsource0.d.html" is HTML 4.01 Strict compliant.
The Crazy SGML Stuff demonstration file file "sgmlcraz.html" is HTML 4.01 Strict compliant, but with 5 warnings.


Next Level Up