SILMARIL consultants

Where would you like to go tomorrow?

Panorama
Users of Panorama
can now get full SGML from HTML Pro.

(Note that if your browser displays the word ™ above then you should contact your browser maker and ask them to support the wider range of ISO character entities.)

Best viewed with the browser of your choice.
any HTML browser

The HTML Professional DTD

A composite Document Type Description for the HyperText Markup Language

DRAFT version 0 revision 11 (1 January 1997)

Copyleft notice

The HTML composite DTD is Copyleft (c) 1996--97 by Silmaril Consultants and is protected by the terms of the GNU General Public License, a copy of which is included in the distribution of this software as file gnugpl.html.

You may freely distribute and use this material, provided that nothing is done to prevent its further distribution or use, and subject to the condition that it may not be distributed without a similar condition, including this condition, being imposed on the subsequent user. Modifications should be reported to the author so that they can be incorporated in a controlled manner.


Management summary

Description

HTML Pro is an up-to-date, reliable, and robust Document Type Description (DTD) for the HyperText Markup Language, which is used to create hypertext pages as used in the World-Wide Web.

Objectives

HTML Pro is designed for use by professional document authors and designers to replace the earlier versions of HTML which were subject to constraints for standards work which made them inflexible. HTML Pro provides extended support for editors or authors which is not available in most other versions.

Composition

The present DTD is built as a composite of all known HTML DTDs to date. It has been arranged in such a way as to allow authors and designers to support their target browsers but still ensure that the files they create are conformant. This means they are valid, parsable instances of SGML (ISO 8879 --- Standard Generalized Markup Language, the language in which HTML is written).

Benefits

Conformance to a standard is important for businesses and other organizations, and for authors, designers or other individuals because it provides flexibility. This means that portable, reusable, long-term or large-scale Web information can be edited, controlled, and reused on multiple platforms using a wider range of software than is possible when tied to a specific manufacturer.

Web pages can now easily be based in the same SGML syntax used for a wide range of other corporate and institutional document projects (archives, servers, databases, marketing and publication systems, etc), on intranets as well as on the Internet.

Availability and maintenance

The HTML Pro DTD is available free of charge from the sources shown below, and is updated following the professional advice of the members of the www-html mailing list.

HTML Pro is an SGML system conforming to International Standard ISO 8879 --- Standard Generalized Markup Language.


Background

Derivation

The first HTML Document Type Description was devised to support the original versions of World-Wide Web software in the early 1990s. A revised and more widely-discussed version was codified as HTML 2.0 by the Internet Engineering Task Force Working Group on HTML, and adopted by the IETF as a draft standard, RFC1866, in November 1995.

A more advanced but experimental version, HTML+ (now obsolete), had been under discussion for several years, and many of its features were republished as HTML3 in an Internet Draft (March 1995). On expiry, this draft was taken back into discussion, and passed to the World-Wide Web Consortium (W3C), when it took over HTML development from the IETF in 1996.

The W3C has attempted to reconcile the sometimes conflicting aspirations of some of its members by publishing two experimental versions of HTML in May and July 1996: respectively Wilbur, the codename for HTML 3.2, which is largely HTML 2.0 disguised with the addition of stylesheet and scripting support; and Cougar, which is a less structured version of HTML3 with some changes to the tables model, and some of the new content markup removed. While both have their adherents and proponents, both are highly selective, and neither can be said to be in any way stable or comprehensive. Both also suffer from a lack of discipline, because they try to accommodate historical, even obsolete, practice in a fully backwards-compatible manner. The result is that both use a flat content model for the body of the document, and have reintroduced --- without warning or explanation --- several elements which the original designers had clearly marked for removal.

Companies implementing Web software (particularly browsers), and individuals implementing HTML pages, have throughout this period continually sought additional markup facilities for their own and their customers' purposes. In some cases element names were simply invented on an ad hoc basis, with no attempt to define when or how they can be used; in other cases, market position was available as leverage to add support for them ahead of their inclusion in a DTD. While the former method can be excused in individuals on grounds of ignorance, the same argument cannot be made for corporations. The latter method is natural from a marketing point of view, but it has led to a further Balkanization of the DTD position, with separate versions needed to describe differing implementations by Microsoft, Sun, Netscape and others, as well the implementation of more practical versions by SoftQuad and other editor makers.

HTML Pro

The edition of HTML presented here, codenamed Aardvark, is a composite of all known versions to date. It contains all the elements published (as far as they can be found or identified) in the various forms of HTML since 1990, in a manner which can be used by editors, browsers, parsers, databases, search engines, formatters, indeed any conforming application of ISO 8879 --- Standard Generalized Markup Language (SGML, the language in which HTML is written).

HTML Pro is intended for use by the information professional who has neither the time nor the inclination to participate in the interesting but destructive competition between rival browser makers, and who is unwilling to mislead her clients by creating anything other than valid, conformant instances of HTML.


The DTD

There are now 140 elements in the DTD sharing 245 different attribute names. Of these attributes, seven are common to all elements (see below). The element with the greatest number of attributes is INPUT (52).

Downloading and installing

Downloading

The HTML Pro DTD is available in several forms: this file with active links is maintained at http://www.arbornet.org/~silmaril/dtds/html/htmlpro.html and the distribution files are held at ftp://ftp.ucc.ie/pub/html/

The full or the minimal distribution should be saved to a suitable temporary directory before installing (unless you are running WinZip, which automates this). There is a DOS htmlpro.zip version available for non-Windows users.

The installation needs to be able to create (or if they exist, to write to) the /usr/local/lib and /html directories (on a PC, this is C:\usr\local\lib and C:\html; on a Mac this means creating folders at the top level in your hard disk).

If you use a shared computer system, you may not have permission to create directories or folders at this level, in which case you need to obtain that permission from your systems administrators, or have them do the installation for you.

It is quite possible to install HTML Pro elsewhere on your hard disk, but you will need to consult the manuals for your applications software to see what directories the resulting files need to be moved into before it will work. The list of files in the distribution (see below) shows in [square brackets] the directories or folders into which files go by default.

Installation

The UNIX version unpacks with the commands:

gunzip htmlpro.tar.gz
tar -xvf htmlpro.tar

UNIX users who intend to use Emacs with psgml-mode to parse SGML Public Identifers in their files will need to create downcased soft links to the subdirectories DTD and ENTITIES within any of the Public Owner directories in /usr/local/lib/sgml, for example with the commands (in each directory needed):

ln -s DTD dtd
ln -s ENTITIES entities

The PC/Windows version is a WinZip self-installer which unpacks automatically when you double-click it. If you prefer not to execute such programs, you should download the DOS version instead and unzip it manually.

Emacs users under DOS and Windows do not need to change any directory names.

The Mac version is a BinHex'd zip file which can be unpacked by UnStuffIt or similar dearchiving program. The BinHexing should be unnecessary, in which case you can download the DOS zip file referred to above, and drop it onto UnStuffIt. Macs are poorly catered for by the SGML software community, so little information is available about where files like DTDs and character entity collections should be installed: if you have this information, please mail us.

Versioning

The versions of HTML included in this edition were taken from public copies of DTDs and fragments on the Web: this list appears in slightly altered form later on, showing the codes used in the DTD to identify sources.

  • HTML 1.0 (CERN, 1992)
  • HTML+ (Proposal, 1993)
  • HTML 2.0 (RFC1866, November 1995)
  • HTML3 (Internet Draft, March 1995)
  • *Netscape Navigator DTD (October 1994)
  • *Sun Microsystems HotJava DTD (July 1995)
  • *Microsoft Internet Explorer DTD (March 1996)
  • SoftQuad HoTMetaL Pro 2.0 DTD (1995)
  • Internet Draft: Compound Documents in HTML (Burchard & Raggett)
  • Internet Draft: Internationalization of HTML (now RFC2070)
  • Form-based File Upload in HTML (RFC1867)
  • HTML Tables (RFC1942)
  • Wilbur (HTML 3.2, W3C, May 1996)
  • Cougar (W3C, July 1996)
  • *WebTV (December 1996)
  • *Asterisked items were not produced by the companies themselves, but are compatibility releases generated from the HTML3 DTD or other sources in order to provide some foundation for the experiments and claims of the companies. While the browser makers are occupied in developing the interface, it is unlikely that these DTDs will be revised.

    My thanks to all the many authors and contributors to these DTDs, whose notes and comments have made it easier to work out what to do, and whose research has made it possible to identify to some extent which browser can support which elements.

    The current version of the HTML Pro DTD is 0.11, and it is released for comment by the relevant interested parties on the mailing list and elsewhere. Please forward all comments and suggestions to the mailing list for discussion.

    Basic changes introduced

    Very few changes to structure have been made, although those familiar with the internals of previous versions will notice the large amount of additional material from the other sources, including the unilaterally-introduced elements invented by some browser makers (which in many cases has involved a guess at their structure, as the companies have often been reluctant to provide details).

    The header is as defined in HTML3, with the addition of NOSCRIPT and BGSOUND. The content model has been rearranged to allow intermingled META and LINK elements, and still try to help editing systems keep header elements in some semblance of logical order (my thanks to Joe English for this code). The use of META is now expanding to include the Dublin Core metadata qualifiers (Title, Subject, Author, Publisher, OtherAgent, Date, ObjectType, Form, Identifier, Relation, Source, Language, and Coverage).

    The most significant structural change in the text body is to the way in which content models are used. The original mechanism was for %body.content; to contain both structural and descriptive elements as peers, so that there was no distinction between, say, a list and some running text in italics. The implication was originally that inline markup, which was identified as %flow; in earlier versions, should be contained in a paragraph-level element, and this is in fact what SoftQuad's HoTMetaL Pro does when it imports existing invalid HTML.

    The DTD has not been modularised in any way: on the contrary, it is currently an entirely flat file with only the barest minimum of entities. This is deliberate at this stage, as some non-SGML tools were used in the conversion of the code from the various sources, and following the text formatting established by Near&Far for the layout proved useful for this purpose, where each declaration and each attribute definition is followed by a line containing a comment. In a future version, use will be made of the work of Altheim, Connolly, Maler, and Allen in defining the modularised version of HTML.

    There are now four classes of elements, defined by the entities following:

    Taxonomy of HTML text body elements

    Parameter entity

    Elements represented

    %structure;

    Elements which are generally used to contain the continuous text of the document.

    DIV, CENTER, H1 to H6, P, UL, OL, DL, DIR, MENU, PRE, XMP, LISTING, BLOCKQUOTE, BQ, MULTICOL, NOBR, FORM, TABLE, ADDRESS, FIG, BDO, NOTE, and FN; plus WBR, LI, and LH

    %insertions;

    Elements which usually contain special-purpose material, or no text material at all.

    BASEFONT, APPLET, OBJECT, EMBED, SCRIPT, MAP, MARQUEE, HR, ISINDEX, BGSOUND, TAB, IMG, IMAGE, BR, plus NOEMBED, SERVER, SPACER, AUDIOSCOPE, and SIDEBAR;

    %text;

    Elements within the %structure; which directly contain running text.

    Descriptive or analytic markup: EM, STRONG, DFN, CODE, SAMP, KBD, VAR, CITE, Q, LANG, AU, AUTHOR, PERSON, ACRONYM, ABBREV, INS, DEL, and SPAN

    Visual markup:S, STRIKE, I, B, TT, U, NOBR, WBR, BR, BIG, SMALL, FONT, STYLE, BLINK, TAB, BLACKFACE, LIMITTEXT, NOSMARTQUOTES, and SHADOW

    Hypertext and graphics: A and IMG

    Mathematical: SUB, SUP, and MATH

    Documentary: COMMENT, ENTITY, ELEMENT, and ATTRIB

    %formula;

    Mathematical content.

    BOX, ABOVE, BELOW, VEC, BAR, DOT, DDOT, HAT, TILDE, ROOT, SQRT, ARRAY, SUB, SUP, B, I, T, and BT

    The distinction is that the insertions class can be peer with text as well as with structure, whereas text can nest only within structure.

    Some changes have been made to the content models of the text elements to accommodate this, but the only noticeable ones are SUP and SUB, which may now contain math elements even when used in non-math mode. The exclusion exceptions for MATH have been significantly reduced because it would appear from discussions with mathematicians that it is in fact unobjectionable for math to contain many of the elements previously proscribed. The effect of these changes is an increase in the amount of mixed content (see note below), which some analysts regard as pernicious.

    The exclusion exceptions for PRE now include TT and BR, as neither have any relevance in preformatted fixed-width material, but SUB and SUP are now permitted, as it seems perfectly reasonable that an author might want to represent typewritten material containing subscripts and superscripts. PRE can now include IMG to enable character-cell-sized image glyphs, and APPLET, EMBED, and OBJECT to allow the dynamic generation of fixed-width content.

    Accessibility issues

    The ICADD fixed attributes which were added by the late Yuri Rubinsky have been reinserted for those elements to which they were attached in RFC1866. Work is proceeding with the developers of the ICADD DTD to identify what equivalents to add for the remaining elements, and this is being coordinated with a similar exercise being undertaken on the Text Encoding Initiative (TEI) DTD. A note to RFC2070 says:

    HTML contains SGML Document Access (SDA) fixed attributes in support of easy transformation to the International Committee for Accessible Document Design (ICADD) DTD "-//EC-USA-CDA/ICADD//DTD ICADD22//EN". ICADD applications are designed to support usable access to structured information by print-impaired individuals through Braille, large print and voice synthesis. For more information on SDA & ICADD:

    • ISO 12083:1993, Annex A.8, Facilities for Braille, large print and computer voice
    • ICADD ListServ ICADD%ASUACAD.BITNET@ARIZVM1.ccit.arizona.edu
    • Usenet news group bit.listserv.easi
    • Recording for the Blind, +1 800 221 4792

    Following the informal discussions at SGML'96, some recent suggestions from Harvey Bingham have been incorporated in respect of Tables (additional attributes).

    Recent alterations and additions

    I am grateful to Foteos Macrides for keeping me updated on the implementations used for Lynx and other browsers. His request for a TITLE attribute for FORM and MAP triggered a rework of the common attributes which are now applied to all elements:

  • ID
  • CLASS
  • STYLE
  • TITLE
  • LANG
  • DIR
  • NOINDEX
  • Internationalization

    The LANG and DIR (and some changes to the SGML Declaration) are reintroduced directly from the Internet Draft on Internationalization of HTML (now RFC2070). Authors should note the following (from RFC2070) and ensure that they use only valid numeric character references:

    The SGML declaration, like that of HTML 2.0, specifies the 32 character numbers 128 to 159 (decimal) as UNUSED. This means that numeric character references within that range (eg &#146) are illegal in HTML. Neither ISO 8859-1 nor Unicode contains characters in that range, which is reserved for control characters.

    The relevant RFCs referred to here and in the DTD source code are included in the distribution in /usr/local/lib/sgml/IETF/HTML for want of a better location.

    Indexing

    Scott Preece suggested the NOINDEX attribute to provide a means of allowing an author to specify to indexing engines that the document is not considered meaningful for index-gathering (valuable for pages originally generated from scripts but stored as temporarily static files).

    HTML3 revisited

    Foteos also brought to my attention the need to replace more of the material originally in HTML3 which had fallen away, as well as the attributes supported by Lynx, including the DISABLED attribute for INPUT, OPTION, SELECT, and TEXTAREA; the PLAIN attribute for UL; CONTINUE for OL; ALT for BGSOUND, APPLET, AREA, IMG, and EMBED; and the HREF attribute (and ACTION as a synonym, from Mosaic) for ISINDEX.

    It is worth warning that some browser and editor manufacturers are accepting or generating very careless code of the form

    <isindex http:foo.bar.com/blort>

    which is not only unparsable, and thus unusable anywhere else, but unnecessary given the availability of HREF.

    Drazen Kacar suggested the reinclusion of the CLEAR attribute on all structural elements, so this has been done. The assumption is that it is not needed on inline or descriptive markup.

    Embedded objects

    The controversy over EMBED vs OBJECT was no less acrimonious than the earlier one over APP or APPLET vs FIG, and has been treated by including all five elements as implemented. The now-expired Internet Draft on Compound Documents by Paul Burchard and Dave Raggett included PARAMS as an attribute on EMBED, rather than the attribute soup approach piloted by some browser-makers, and the W3C now has their own document on the subject.

    Mathematics

    Little comment has been received to date about math content. The view was expressed at SGML'96 that the whole question of SGML math was still up for grabs, so it is unlikely that any significant change to the HTML3--based content will be made unless mathematicians present some kind of conclusion.

    I am grateful to Dave Carter, however, for the suggestion to include FONT in the model, so that manual/visual adjustment can be made to large symbols where browsers fail to render them automatically. There is as yet no default inclusion of the ISOams* character entities (although that can easily be done with a simple edit).

    There is one significant new attribute, ROLE on MATH, which can take the values INLINE or DISPLAY. This corresponds with standard mathematical usage as reflected in TeX (this attribute has been made #REQUIRED). Following hints from Tom Magliery, it is possibly true that most math authors are writing in TeX and making GIFs.

    TV browsing

    There has been one special group of added elements: those implemented by WebTV without a DTD. The elements added are AUDIOSCOPE and SIDEBAR in the insertions class, and BLACKFACE, LIMITTEXT, NOSMARTQUOTES, and SHADOW to the text class. A substantial number of specialist attributes have been added to handle the WebTV output, and these are identified by (T) in the DTD source.

    Plugin support

    The only requested addition has come from FirstFloor Software, who provide the Smart Bookmarks module shipped in Netscape's PowerPack. This allows for specification on any A (anchor) element of the date and time (HTTP format, eg Sun, 06 Nov 1994 08:49:37 GMT) of a bulletin, plus displayable text and links to the bulletin and an image.

    Manufacturers who are experimenting with additional markup are invited to send details (with a DTD fragment) so that the elements can be included in a subsequent release of HTML Pro. The address for submissions is silmaril@m-net.arbornet.org

    SGML experiment

    The astute reader will have noticed the new elements ELEMENT and ATTRIB, which have been added to this version to test their use in documentation, in order to accompany the already proposed ENTITY and COMMENT. It is not the intention that they should remain past v1r0 unless a formal approach is made to the then controllers of the HTML standard.

    The COMMENT element is in any case wholly redundant, and it is assumed that the element was invented in an uncontrolled burst of misdirected enthusiasm, coupled with a significant failure to read the HTML specification, as COMMENT does not correctly implement comments at all (embedded markup gets interpreted instead of ignored, in implementations to date).

    Documentation of markup sources

    Where it has been possible to find out who invented an element, this is noted in the comment which follows every element and attribute in the DTD source code. This comment is in the form used by Microstar's Near&Far which they call a title:

    <!element foo - - %bar;
     --<Title>(X)comment goes here-->
    ...
    <!attlist foo %commonatts;
     bar CDATA #IMPLIED
     --<Title>(Y)comment goes here-->

    The attribution is given by quoting one or more single-letter codes in parentheses immediately before the comment (see Figure 2 for the codes). I am grateful to the authors of the WebTV proposals for their work on identifying some of these origins.

    Browser codes used in attribution comments

    Code

    Source

    1

    HTML, CERN original version (1)

    2

    HTML 2.0 (RFC1866), the formal standard

    3

    HTML3, the expired Internet Draft

    C

    Cougar, HTML 3.2, a W3C stopgap

    W

    Wilbur, a W3C experiment

    M

    Microsoft proposals

    N

    Netscape proposals

    S

    Sun proposals

    L

    Lynx proposals

    T

    WebTV proposals

    I

    Internationalization (I18N, RFC2070)

    F

    Form-based file uploads (RFC1867)

    O

    Compound Documents in HTML (Burchard & Raggett)

    P

    HTML Pro experimental or enabling mechanism

    D

    ICADD/HTML Interest Group (from SGML'96)

    B

    FirstFloor Software's Smart Bookmarks Netscape plugin

    ?

    Unknown, information welcomed

    The HTML3 concept of HTML-specific character entity files has been ditched, and this version includes the whole of ISOlat1, ISOlat2, ISOnum, ISOpub and ISOtech, which should have been done years ago.

    Mixed Content

    The infamous problem of mixed content in list items (and elsewhere) has been tackled head on by simply permitting it: text data is allowed as well as paragraphs and other structural markup like lists, tables, and forms. This may cause some less well-endowed editors a little grief, but the advantages of being able to tweak the performance characteristics of various browsers are too great to pass up.

    For those unfamiliar with mixed content, this is a position which obtains when an element can contain both running text (PCDATA in SGML terms), and the inline markup that usually accompanies it, as well as structural markup. Inline markup is the term given to markup which normally occurs at paragraph level, such as emphasis, hypertext links, or font style changes like bold or italics. Structural markup is that which defines the bones of the document: paragraphs, lists, headings, sections, tables etc. Examples of mixed content are (as above) list items, or table cells, which can contain both running text and other structures like headings or further lists.

    White space (linebreaks, tabs, or spaces) between structural elements is usually not significant to an SGML parser, as it cannot have any meaning in terms of the document structure: meaning is borne by the markup at this level. For this reason, HTML browsers (and some SGML display systems) disregard or remove such white space, so it can freely be inserted by editors and authors for their own visual comfort and ease of editing, as it will not affect the display or use of the document. White space within running text (at the paragraph level, for example) is of course very significant, as it separates words.

    Where the content of an element is mixed, therefore, there is an ambiguity about whether any space is there for redundant (non-significant) purposes, or if it is a part of the textual data content. A machine simply cannot tell --- only a human can detect this from the contextual meaning. To enable parsing to work, linebreaks are not permitted by the rules of SGML in certain circumstances immediately prior to, between, and immediately following, structural elements in mixed content.

    This has been regarded as unavoidable, given that different usages are needed, and the same has been permitted within the cells of tables, as established practice dictates. Rather than produce several DTD versions, one permitting text where another permits only further markup (the lax vs strict versions of many HTML DTDs to date), it has been assumed that professional users would be equipped with a suitable editor capable of handling the behavioral and formatting requirements of mixed content.


    Usage

    It is assumed that the user is aware of the meaning and usage of the elements of HTML, and of the principles of SGML. Those who are unfamiliar with these will find help in the only SGML-compliant book on the use of HTML, The World-Wide Web Handbook by Peter Flynn (ITCP 1995, ISBN 1-85032-205-8), which covers HTML 2.0 and HTML3. The Reference Card which accompanied the book is also available on the Web and for download as a PostScript file to print.

    A more extensive list including the proposed additional elements, with a short comment on their intended usage, is included in the HTML Pro distribution as file htmlpro.tag (as used in SoftQuad's RulesBuilder), and is reproduced here in Figure 3.

    Alphabetical list of HTML Pro elements The Periodic Table of the Elements for HTML

    A

    (1) Hypertext anchor

    ABBREV

    (3) Abbreviation

    ABOVE

    (3) Numerator of fraction

    ACRONYM

    (3) Acronym

    ADDRESS

    (1) Address block

    APP

    (S) Obsolete name for APPLET

    APPLET

    (N) Java or other applet

    AREA

    (N) Area in clientside imageMAP

    ARRAY

    (3) Math array

    ATOP

    (3) Top half of unlined fraction

    ATTRIB

    (P) SGML attribute name

    AU

    (3) Author name

    AUDIOSCOPE

    (T) Visual display of a sound played

    AUTHOR

    (L) Author

    B

    (2) Unidentified bold type

    BANNER

    (3) Unscrollable banner

    BAR

    (3) Math overbar

    BASE

    (1) Home URL of this file

    BASEFONT

    (M) Base size for subsequent fonts

    BDO

    (I) BiDirectional override

    BELOW

    (3) Denominator of fraction

    BGSOUND

    (M) Background sound

    BIG

    (3) Larger type

    BLACKFACE

    (T) Extra bold

    BLINK

    (N) Blinking

    BLOCKQUOTE

    (1) Block (indented) quotation

    BODY

    (1) Body of the text

    BODYTEXT

    (3) Identifies body after BANNER

    BOX

    (3) Math fraction

    BQ

    (3) Alternate BLOCKQUOTE

    BR

    (2) Forced linebreak

    BT

    (3) Math bold typewriter type

    CAPTION

    (3) Caption of TABLE or FIGure

    CENTER

    (N) Arbitrary centering

    CHOOSE

    (3) Math binomial choice

    CITE

    (3) Citation (book, product, etc)

    CODE

    (2) Program code

    COL

    (3) Column spec in TABLE

    COLGROUP

    (3) Group of COLs

    COMMENT

    (?M) Visible comment (superfluous)

    CREDIT

    (3) Credits for FIGure (illustration)

    DD

    (1) Definition list discussion

    DDOT

    (3) Math double-dot accent

    DEL

    (3) Text marked for deletion

    DFN

    (3) Definition of new (index) term

    DIR

    (1) Directory list

    DIV

    (3) Structural division

    DL

    (1) Definition list

    DOT

    (3) Math dot accent

    DT

    (1) Definition list term

    ELEMENT

    (P) SGML element name

    EM

    (1) Emphasis

    EMBED

    (M) Embedded object

    ENTITY

    (?M) SGML or other entity name

    FIELDSET

    (?N) Group of FORM fields

    FIG

    (3) Figure or illustration

    FIGTEXT

    (3) Dummy text for FIGures

    FN

    (3) Footnote

    FONT

    (N) Specific font change

    FORM

    (2) Fill-in form

    FRAME

    (N) Subdivision of user's window

    FRAMESET

    (N) Group of FRAMEs

    H1

    (1) Top-level heading

    H2

    (1) Second-level heading

    H3

    (1) Third-level heading

    H4

    (1) Fourth-level heading

    H5

    (1) Fifth-level heading

    H6

    (1) Sixth-level heading

    HAT

    (3) Math hat (circumflex) accent

    HEAD

    (1) Documentation header

    HR

    (1) Horizontal rule

    HTML

    (1) Entire HTML document

    I

    (2) Unidentified italics

    IMAGE

    (N) As IMG (superfluous)

    IMG

    (2) Inline image (picture, icon)

    INPUT

    (2) Inline input in FORM

    INS

    (3) Text to be inserted

    ISINDEX

    (1) Input to non-FORM script

    ITEM

    (3) Item in MATH ARRAY

    KBD

    (1) Keyboard key or screen button

    KEYGEN

    (N) Generates PublicKey

    LABEL

    (?N) FORM field name label

    LANG

    (3) Foreign language

    LEFT

    (3) MATH fraction left delimiter

    LH

    (3) List heading

    LI

    (1) List item

    LIMITTEXT

    (T) Width-limited text

    LINK

    (2) Link to other resources

    LISTING

    (1) Computer printout

    MAP

    (N) Clientside imagemap

    MARQUEE

    (M) Marching display

    MATH

    (3) Mathematics

    MENU

    (1) Menu list

    META

    (2) Metainformation

    MULTICOL

    (N) Encloses multicolumn text

    NEXTID

    (1) Edit control value

    NOBR

    (N) Prohibit linebreak

    NOEMBED

    (M) Alternative for browsers with no EMBED

    NOFRAMES

    (N) Alternative for browsers with no FRAMEs

    NOSCRIPT

    (N) Alternative for browsers with no SCRIPT

    NOSMARTQUOTES

    (T) Turns off smart quotes

    NOTE

    (3) Note of any kind

    OBJECT

    (W) Inline embedded object

    OF

    (3) MATH ROOT divider

    OL

    (1) Ordered (numbered) list

    OPTION

    (2) Choice in a FORM SELECTion menu

    OVER

    (3) MATH fraction (BOX) divider

    OVERLAY

    (3) FIGure image overlay

    P

    (1) Paragraph

    PARAM

    (S) Parameter to APPLET or OBJECT

    PERSON

    (3) Personal name

    PLAINTEXT

    (1) Unmarked plain text

    PRE

    (3) Preformatted text

    Q

    (3) Quote (direct speech)

    RANGE

    (3) Text between two SPOTs

    RIGHT

    (3) MATH fraction right delimiter

    ROOT

    (3) MATH root

    ROW

    (3) Row in MATH ARRAY

    S

    (2) Strikeout text

    SAMP

    (1) Inline computer input/output

    SCRIPT

    (SN) Executable inline script

    SELECT

    (2) FORM selection menu

    SERVER

    (M) Invokes server-side push script

    SHADOW

    (T) Shadowed text

    SIDEBAR

    (T) Unscrollable sidebar

    SMALL

    (3) Smaller type

    SPACER

    (N) Inserts concrete spacing

    SPAN

    (3) Identifies arbitrary portion of text

    SPOT

    (3) Start or end of RANGE

    SQRT

    (3) MATH square root

    STRIKE

    (3) Strikeout text

    STRONG

    (1) Strong emphasis

    STYLE

    (3) Embedded stylesheet formatting

    SUB

    (3) Subscript

    SUP

    (3) Superscript

    T

    (3) MATH typewriter type

    TAB

    (3) Relocatable tab across screen/page

    TABLE

    (3) Tabular alignment

    TBODY

    (3) Body within a TABLE

    TD

    (3) Cell of TABLE Data

    TEXTAREA

    (2) Free-form text input in FORM

    TEXTFLOW

    (3) Dummy

    TFOOT

    (3) Foot of a TABLE

    TH

    (3) Row/col header cell in TABLE

    THEAD

    (3) Headings of a TABLE

    TILDE

    (3) MATH tilde accent

    TITLE

    (1) Document title

    TR

    (3) TABLE row

    TT

    (1) Unidentified typewriter type

    U

    (1) Underlined type

    UL

    (1) Unordered (bulleted) list

    VAR

    (1) Computer variable or filename

    VEC

    (3) MATH vector

    WBR

    (N) Wordbreak

    XMP

    (1) Example fixed-format text

    XTML

    (P) Container for corpora of HTML


    Outstanding items

    The machine-generated status of the DTD file meant that there was a substantial amount of legacy comment from the assorted versions used in compositing the DTD. The majority of this has now been edited out so that obsolete comments and parameter entities are not left to confuse the unwary reader, but users who locate undetected residua are asked to report them (to the address below).

    Now that the DTD is stabilizing, there is a need to re-parameterize some of the material, and use some of the architectural material already tested in the modular version of the standard.

    The elements for which no ICADD fixed attribute exists need analyzing and the relevant values adding from the International Committee for Accessible Document Design DTD ("-//EC-USA-CDA/ICADD//DTD ICADD22//EN"; work is under way on this).

    A longer-term goal is to devise and maintain publicly a table of which elements and which attributes are actively supported in which browsers.

    There are undoubtedly other errors, both of omission and of commission, and I would be very grateful for details so that they can be fixed: silmaril@m-net.arbornet.org.


    Distribution

    The current distribution includes the following files. The default installation directory is given in [square brackets].

    Software manufacturers who would like compiled versions in their own format included are invited to contact the authors.

    Mapping of Formal Public Identifiers to directory and file names

    +// or -//

    Registered (+) or unregistered (-) under ISO/IEC 9070

    Ignored while most owners are unregistered

    Silmaril//

    ISO/IEC 9070 Public Owner Identifier

    Treated as a directory name under /usr/local/lib/sgml/, with spaces replaced by underscore characters.

    DTD or ENTITY

    Class

    Class of document treated as further subdirectory (note Emacs psgml-mode expects this to be lowercase; other systems expect uppercase; DOS/Windows doesn't care).

    HTML Pro v0r11 19970101//

    Name

    Document name: treated as filename, with spaces replaced by underscore characters.

    EN

    Language

    Ignored by most software (default English).

    SGML support

    No SGML software is included, as this can all be obtained online. A good list of software and associated information on projects and activities is kept by Robin Cover at http://www.sil.org/sgml/sgml.html.

    Editing

    The minimum to get started editing HTML in a fully-conformant manner is an SGML editor. Some of the most popular cross-platform systems seem to be:

    Browsing

    Softquad also do a Web browser add-on for Windows PCs called Panorama. There's a free version online, but if you're planning on using SGML more than trivially, the commercial version, Panorama Pro, is strongly recommended.

    This file can be used with either version, but with PanoFree you need to install the minimal subset of HTML Pro, whereas PanoPro it will automatically download the right bits and pieces for you. If you've got PanoPro already installed and working, you can go right ahead and click here.

    If you want to get started with the free version, do the following:

    1. Make sure your regular browser (Netscape, Mosaic, Explorer) is already running first: presumably it is, or is at least possible, or you wouldn't be reading this (unless someone else printed it off for you);

    2. Download PanoFree and double-click to install it, making sure you use the default directory (C:\Softquad\Panorama). If it doesn't mention your browser as one it works with, don't worry --- after installation is finished, go to your browser's Options menu and reconfigure your Helper applications for the MIME type text/x-sgml to make file types .sgm and .sgml start the c:\softquad\panorama\panorama.exe program;

    3. Close down and restart your browser so that it picks up the link to Panorama correctly;

    4. Don't be tempted to try and run Panorama yet before the installation of HTML Pro is complete;

    5. Download the PanoFree HTML Pro starter kit (about 45Kb) and double-click to install it;

    Now you should be able to click here and view this file using all the facilities of a real SGML application (well, nearly all: some of them are reserved for customers of the commercial version, Panorama Pro).

    Disclaimer

    Silmaril has no connection with or financial interest in SoftQuad, Inc. or with any other company mentioned in this document.


    Don't forget, it's spelled HTML Pro but it's pronounced A-a-r-d-v-a-r-k


    Peter Flynn