INTERNET-DRAFT Charles H. Lindsey Usenet Format Working Group University of Manchester April 2001 News Article Format Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This Draft defines the format of Netnews articles and specifies the requirements to be met by software which originates, distributes, stores and displays them. It is intended as a standards track document, superseding RFC 1036, which itself dates from 1987. Since the 1980s, Usenet has grown explosively, and many Internet and non-Internet sites now participate. In addition, this technology is now in widespread use for other purposes. Backward compatibility has been a major goal of this endeavour, but where this standard and earlier documents or practices conflict, this standard should be followed. In most such cases, current practice is already compatible with these changes. [The use of the words "this standard" within this document when referring to itself does not imply that this draft yet has pretensions to be a standard, but rather indicates what will become the case if and when it is accepted as an RFC with the status of a proposed or draft standard.] C. H. Lindsey [Page 1] News Article Format April 2001 [Remarks enclosed in square brackets and aligned with the left margin, such as this one, are not part of this draft, but are editorial notes to explain matters amongst ourselves, or to point out alternatives, or to indicate work yet to be done.] [Please note that this Draft describes "Work in Progress". Much remains to be done, though the material included so far is unlikely to change in any major way.] Table of Contents 1. Introduction .................................................. 5 1.1. Basic Concepts ............................................ 5 1.2. Objectives ................................................ 6 1.3. Historical Outline ........................................ 6 1.4. Transport ................................................. 6 2. Definitions, Notations and Conventions ........................ 6 2.1. Definitions. ............................................. 7 2.2. Textual Notations ......................................... 8 2.3. Relation To Mail and MIME ................................. 9 2.4. Syntax Notation ........................................... 10 2.5. Language .................................................. 12 3. Changes to the existing protocols ............................. 13 3.1. Principal Changes ......................................... 13 3.2. Transitional Arrangements ................................. 13 4. Basic Format .................................................. 15 4.1. Syntax of News Articles ................................... 15 4.2. Headers ................................................... 16 4.2.1. Names and Contents .................................... 16 4.2.2. Header Properties ..................................... 17 4.2.2.1. Experimental Headers .............................. 17 4.2.2.2. Inheritable Headers ............................... 18 4.2.2.3. Local Headers ..................................... 18 4.2.2.4. Variant Headers ................................... 18 4.2.3. White Space and Continuations ......................... 18 4.2.4. Comments .............................................. 19 4.2.5. Undesirable Headers ................................... 20 4.3. Body ...................................................... 20 4.3.1. Body Format Issues .................................... 20 4.3.2. Body Conventions ...................................... 21 4.4. Characters and Character Sets ............................. 22 4.4.1. Character Sets within Article Headers ................. 23 4.4.2. Character Sets within Article Bodies .................. 24 4.5. Size Limits ............................................... 24 4.6. Example ................................................... 25 5. Mandatory Headers ............................................. 26 5.1. Date ...................................................... 26 5.1.1. Examples .............................................. 26 5.2. From ...................................................... 27 5.2.1. Examples: ............................................ 27 5.3. Message-ID ................................................ 28 5.4. Subject ................................................... 28 5.4.1. Examples .............................................. 29 C. H. Lindsey [Page 2] News Article Format April 2001 5.5. Newsgroups ................................................ 29 5.5.1. Forbidden newsgroup names ............................. 31 5.6. Path ...................................................... 32 5.6.1. Format ................................................ 32 5.6.2. Adding a path-identity to the Path header ............. 33 5.6.3. The tail-entry ........................................ 34 5.6.4. Delimiter Summary ..................................... 35 5.6.5. Suggested Verification Methods ........................ 35 5.6.6. Example ............................................... 36 6. Optional Headers .............................................. 37 6.1. Reply-To .................................................. 37 6.1.1. Examples .............................................. 38 6.2. Sender .................................................... 38 6.3. Organization .............................................. 38 6.4. Keywords .................................................. 38 6.5. Summary ................................................... 39 6.6. Distribution .............................................. 39 6.7. Followup-To ............................................... 40 6.8. Mail-Copies-To ............................................ 41 6.9. Posted-And-Mailed ......................................... 42 6.10. References ............................................... 42 6.10.1. Examples ............................................. 43 6.11. Expires .................................................. 43 6.12. Archive .................................................. 43 6.13. Control .................................................. 44 6.14. Approved ................................................. 44 6.15. Replaces / Supersedes .................................... 45 6.15.1. Syntax and Semantics ................................. 45 6.15.2. Message-ID version procedure ......................... 46 6.15.2.1. Message version numbers .......................... 47 6.15.2.2. Implementation and Use Note ...................... 48 6.15.2.3. The Message-Version NNTP extension ............... 50 6.15.2.4. Examples ......................................... 50 6.16. Xref ..................................................... 51 6.17. Lines .................................................... 53 6.18. User-Agent ............................................... 53 6.18.1. Examples ............................................. 54 6.19. Injector-Info ............................................ 54 6.19.1. Usage of Injector-Info-header-parameters ............. 56 6.19.1.1. The posting-host-parameter ....................... 56 6.19.1.2. The posting-account-parameter .................... 57 6.19.1.3. The posting-sender-parameter ..................... 57 6.19.1.4. The posting-logging-parameter .................... 57 6.19.1.5. The posting-date-parameter ....................... 57 6.19.2. Example .............................................. 57 6.20. Complaints-To ............................................ 57 6.21. MIME headers ............................................. 58 6.21.1. Syntax ............................................... 58 6.21.2. Content-Transfer-Encoding ............................ 58 6.21.3. Content-Type ......................................... 59 6.21.3.1. Message/partial .................................. 59 6.21.3.2. Message/rfc822 ................................... 60 6.21.3.3. Message/external-body ............................ 60 6.21.3.4. Multipart types .................................. 61 C. H. Lindsey [Page 3] News Article Format April 2001 6.21.4. Character Sets ....................................... 61 6.21.5. Content Disposition .................................. 61 6.21.6. Definition of some new Content-Types ................. 61 6.21.6.1. Application/news-transmission .................... 62 6.21.6.2. Message/news withdrawn ........................... 63 6.22. Obsolete Headers ......................................... 63 7. Control Messages .............................................. 63 7.1. The 'newgroup' Control Message ............................ 64 7.1.1. The Body of the 'newgroup' Control Message ............ 65 7.1.2. Application/news-groupinfo ............................ 65 7.1.3. Initial Articles ...................................... 67 7.1.4. Example ............................................... 67 7.2. The 'rmgroup' Control Message ............................. 68 7.2.1. Example ............................................... 69 7.3. The 'mvgroup' Control Message ............................. 69 7.3.1. Single group .......................................... 69 7.3.2. Multiple Groups ....................................... 70 7.3.3. Examples .............................................. 70 7.4. The 'checkgroups' Control Message ......................... 71 7.4.1. Application/news-checkgroups .......................... 72 7.5. Cancel .................................................... 73 7.6. Ihave, sendme ............................................. 74 7.7. Obsolete control messages. ............................... 75 8. Duties of Various Agents ...................................... 76 8.1. General principles to be followed ......................... 76 8.2. Duties of an Injecting Agent .............................. 76 8.2.1. Proto-articles ........................................ 77 8.2.2. Procedure to be followed by Injecting Agents .......... 77 8.3. Duties of a Relaying Agent ................................ 78 8.4. Duties of a Serving Agent ................................. 79 8.5. Duties of a Posting Agent ................................. 80 8.6. Duties of a Followup Agent ................................ 80 8.7. Duties of a Moderator ..................................... 81 8.8. Duties of a Gateway ....................................... 82 8.8.1. Duties of an Outgoing Gateway ......................... 83 8.8.2. Duties of an Incoming Gateway ......................... 84 8.8.3. Example ............................................... 86 9. Security and Related Considerations ........................... 87 9.1. Leakage ................................................... 87 9.2. Attacks ................................................... 87 9.2.1. Denial of Service ..................................... 87 9.2.2. Compromise of System Integrity ........................ 88 9.3. Liability ................................................. 89 10. References ................................................... 90 11. Acknowledgements ............................................. 92 12. Contact Addresses ............................................ 92 13. Intellectual Property Rights ................................. 92 Appendix A.1 - A-News Article Format .............................. 93 Appendix A.2 - Early B-News Article Format ........................ 94 Appendix A.3 - Obsolete Headers ................................... 95 Appendix A.4 - Obsolete Control Messages .......................... 95 Appendix B - Collected Syntax ..................................... 96 C. H. Lindsey [Page 4] News Article Format April 2001 1. Introduction 1.1. Basic Concepts "Netnews" is a set of protocols for generating, storing and retrieving news "articles" (which resemble mail messages) and for exchanging them amongst a readership which is potentially widely distributed. It is organized around "newsgroups," with the expectation that each reader will be able to see all articles posted to each newsgroup in which he participates. These protocols most commonly use a flooding algorithm which propagates copies throughout a network of participating servers. Typically, only one copy is stored per server, and each server makes it available on demand to readers able to access that server. An important characteristic of Netnews is the lack of any requirement for a central administration or for the establishment of any controlling host to manage the network. A network which limits participation to some restricted set of hosts (within some company, for example) is a "closed" network; otherwise it is an "open" network. A set of hosts within a network which, by mutual arrangement, operates some variant (whether more or less restrictive) of the Netnews protocols is a "cooperating subnet". "Usenet" is a particular worldwide open network based upon the Netnews protocols, with the newsgroups being organised into recognized "hierarchies". Anybody can join (it is simply necessary to negotiate an exchange of articles with one or more other participating hosts). Usenet "belongs" to those who administer the hosts of which it is comprised. There is no Cabal with overall authority to direct what is to be be allowed. Nevertheless, there do exist agencies within Usenet that have authority to establish policies and to perform administrative functions, but such authority derives solely from the consent of those sites which choose to recognise it (and who can decline to exchange articles with sites which choose not to recognise it). Usually, the authority of such an agency is restricted to a particular hierarchy, or group of hierarchies. A "policy" is a rule intended to facilitate the smooth operation of a network by establishing parameters which restrict behaviour that, whilst technically unexceptionable, would nevertheless contravene some accepted standard of "Good Netkeeping". Since the ultimate beneficiaries of a network are its human readers, who will be less tolerant of poorly designed interfaces than mere computers, articles in breach of established policy can cause considerable annoyance to their recipients. Policies may well vary from network to network, from hierarchy to hierarchy within one network, and even between individual newsgroups within one hierarchy. It is assumed, for the purposes of this standard, that agencies with varying degrees of authority to establish such policies will exist, and that where they do not, policy will be established by mutual agreement. For the benefit of C. H. Lindsey [Page 5] News Article Format April 2001 networks and hierarchies without such established agencies, and to provide a basis upon which all agencies can build, this present standard often provides default policy parameters, usually introducing them by a phrase such as "As a matter of policy ...". 1.2. Objectives The purpose of this present standard is to define the protocols to be used for Netnews in general, and for Usenet in particular, and to set standards to be followed by software that implements those protocols. It is NOT the purpose of this standard to define how the authority of various agencies to exercise control or oversight of the various parts of Usenet is established (that is itself a matter of policy). Nevertheless, it is assumed that such authorities will exist, and tools are provided within the protocols for their use. 1.3. Historical Outline Network news originated as the medium of communication for Usenet, circa 1980. Since then, Usenet has grown explosively, and many Internet and non-Internet sites participate in it. In addition, the news technology is now in widespread use for other purposes, on the Internet and elsewhere. The earliest news interchange used the so-called "A News" article format. Shortly thereafter, an article format vaguely resembling Internet mail was devised and used briefly. Both of those formats are completely obsolete; they are documented in Appendix A.1 and Appendix A.2 for historical reasons only. With publication of [RFC 850] in 1983, news articles came to closely resemble Internet mail messages, with some restrictions and some additional headers. [RFC 1036] in 1987 updated [RFC 850] without making major changes. A Draft popularly referred to as "Son of 1036" [Son-of-1036] was written in 1994 by Henry Spencer. That document formed the original basis for this standard. Much is taken directly from Son of 1036, and it is hoped that we have followed its spirit and intentions. 1.4. Transport As in this standard's predecessors, the exact means used to transmit articles from one host to another is not specified. NNTP [NNTP] is the most common transmission method on the Internet, but much transmission takes place entirely independent of the Internet. Other methods in use include the UUCP protocol [RFC 976] extensively used in the early days of Usenet, FTP, downloading via satellite, tape archives, and physically delivered magnetic and optical media. 2. Definitions, Notations and Conventions C. H. Lindsey [Page 6] News Article Format April 2001 2.1. Definitions. An "article" is the unit of news, analogous to a [MESSFOR] "message". A "proto-article" is one that has not yet been injected into the news system. A "message identifier" (5.3) is a unique identifier for an article, usually supplied by the "posting agent" which posted it or, failing that, by the "injecting agent". It distinguishes the article from every other article ever posted anywhere. Articles with the same message identifier are treated as if they are the same article regardless of any differences in the body or headers. A "newsgroup" is a single news forum, a logical bulletin board, having a name and nominally intended for articles on a specific topic. An article is "posted to" a single newsgroup or several newsgroups. When an article is posted to more than one newsgroup, it is said to be "crossposted"; note that this differs from posting the same text as part of each of several articles, one per newsgroup. A newsgroup may be "moderated", in which case submissions are not posted directly, but mailed to a "moderator" for consideration and possible posting. Moderators are typically human but may be implemented partially or entirely in software. A "hierarchy" is the set of all newsgroups whose names share a first component (as defined in 5.5). The term "sub-hierarchy" is also used where several initial components are shared. A "poster" is the person or software that composes and submits a possibly compliant article to a "posting agent". The poster is analogous to [MESSFOR]'s author(s). A "posting agent" is the software that assists posters to prepare proto-articles, in compliance with this standard. The proto-article is then passed on to an "injecting agent" for final checking and injection into the news stream. If the article is not compliant, or is rejected by the injecting agent, then the posting agent informs the poster with an explanation of the error. A "reader" is the person or software reading news articles. A "reading agent" is software which presents articles to a reader. A "followup" is an article containing a response to the contents of an earlier article (the followup's "precursor"). A "followup agent" is a combination of reading agent and posting agent that aids in the preparation and posting of a followup. An article's "reply address" is the address to which mailed replies should be sent. This is the address specified in the article's From header (5.2), unless it also has a Reply-To header (6.1). C. H. Lindsey [Page 7] News Article Format April 2001 A "reply agent" is a combination of reading agent and mailer that aids in the preparation and posting of an email response to an article. A "sender" is the person or software (usually, but not always, the same as the poster) responsible for the operation of the posting agent or, which amounts to the same thing, for passing the article to the injecting agent. The sender is analogous to [MESSFOR]'s sender. An "injecting agent" takes the finished article from the posting agent (often via the NNTP "post" command) performs some final checks and passes it on to a relaying agent for general distribution. A "relaying agent" is software which receives allegedly compliant articles from injecting agents and/or other relaying agents, and possibly passes copies on to other relaying agents and serving agents. A "news database" is the set of articles and related structural information stored by a serving agent and made available for access by reading agents. A "serving agent" receives an article from a relaying agent and files it in a news database. It also provides an interface for reading agents to access the news database. A "control message" is an article which is marked as containing control information; a relaying or serving agent receiving such an article may (subject to the policies observed at that site) take actions beyond just filing and passing on the article. A "gateway" is software which receives news articles and converts them to messages of some other kind (e.g. mail to a mailing list), or vice versa; in essence it is a translating relaying agent that straddles boundaries between different methods of message exchange. The most common type of gateway connects newsgroup(s) to mailing list(s), either unidirectionally or bidirectionally, but there are also gateways between news networks using this standard's news format and those using other formats. 2.2. Textual Notations This standard contains explanatory NOTEs using the following format. These may be skipped by persons interested solely in the content of the specification. The purpose of the notes is to explain why choices were made, to place them in context, or to suggest possible implementation techniques. NOTE: While such explanatory notes may seem superfluous in principle, they often help the less-than-omniscient reader grasp the purpose of the specification and the constraints involved. Given the limitations of natural language for descriptive purposes, this improves the probability that implementors and users will understand the true intent of the specification in C. H. Lindsey [Page 8] News Article Format April 2001 cases where the wording is not entirely clear. "ASCII" is short for "the ANSI X3.4 character set" [ANSI X3.4]. While "ASCII" is often misused to refer to various character sets somewhat similar to X3.4, in this standard "ASCII" means X3.4 and only X3.4. ASCII is a 7 bit character set. Please note that this standard requires that all agents be 8 bit clean; that is, they must accept and transmit data without changing or omitting the 8th bit. Certain words, when capitalized, are used to define the significance of individual requirements. The key words "MUST", "REQUIRED", "SHOULD", "RECOMMENDED", "MAY" and "OPTIONAL", and any of those words associated with the word "NOT", are to be interpreted as described in [RFC 2119]. In addition, the word "Ought", when applied to a poster, or to actions of posting and similar agents which a poster may easily override, indicates a recommendation whose violation would do no more than breach established policy, or accepted best practice. NOTE: The use of "MUST" or "SHOULD" implies a requirement that would or could lead to interoperability problems if not followed. Although not following an "Ought" recommendation might do no worse than cause extreme irritation to other readers, particularly in the case of the publicly distributed Usenet, that is no reason not to take it seriously. The essential distinction is that enforcement of a "MUST" or "SHOULD" is a matter of ensuring correct implementation, whereas enforcement of an "Ought" is more a matter of sensible design or of social pressure (whose effectiveness should not be underestimated, even though it cannot be prescribed by this standard). NOTE: A requirement imposed on a relaying or serving agent should be understood as applying only to articles actually accepted for processing by that agent (since any agent may always reject any article entirely, for reasons of site policy). All numeric values are given in decimal unless otherwise indicated. Octets are assumed to be unsigned values for this purpose. Throughout this standard we will give examples of various definitions, headers and other specifications. It needs to be remembered that these samples are for the aid of the reader only and do NOT define any specification themselves. In order to prevent possible conflict with "Real World" entities and people the top level domain of ".example" is used in all sample domains and addresses. The hierarchy of example.* is also used as a sample hierarchy. Information on the ".example" top level domain is in [RFC 2606]. 2.3. Relation To Mail and MIME The primary intent of this standard is to describe the news article format. Insofar as news articles are a subset of the Mail message format augmented by some new headers, this standard incorporates many C. H. Lindsey [Page 9] News Article Format April 2001 (though not all) of the provisions of [MESSFOR], with the aim of enabling news articles to pass through mail systems and vice versa, provided only that they contain the minimum headers required for the mode of transport being used. Unfortunately, the match is not perfect, but it is the intention of this standard that gateways between Mail and News should be able to operate with the minimum of tinkering. [This standard has been designed to fit on top of the drafts currently in preparation for Mail [MESSFOR]. It is expected that those drafts will have progressed to the RFC stage by the time the present standard in complete, at which time all references to [MESSFOR] in the present text will be replaced by references to that RFC.] Likewise, this standard incorporates many (though not all) of the provisions of the MIME standards [RFC 2045] et seq which, though designed with Mail in mind, are mostly applicable to News. 2.4. Syntax Notation This standard uses the Augmented Backus Naur Form described in [RFC 2234]. A discussion of this is outside the bounds of this standard, but it is expected that implementors will be able quickly to understand it with reference to that defining document. Much of the syntax of News Articles is based on the corresponding syntax defined in [MESSFOR] or in the Mime specifications [RFC 2045] et seq, which is deemed to have been incorporated into this standard as required. However, there are some important differences arising from the fact that [MESSFOR] does not recognise anything other than US-ASCII characters, that it does not recognise the MIME headers [RFC 2045], and that it includes much syntax described as "obsolete". NOTE: News parsers historically have been much less permissive than Mail parsers, and this is reflected in the modifications referred to, and in some further specific rules. The following syntactic forms therefore supersede the corresponding rules given in [MESSFOR] and [RFC 2045], thus allowing UTF-8 characters [RFC 2044] to appear in certain contexts (the four rules begining with "strict-" reflect the corresponding original rules from [MESSFOR]). UTF8-xtra-head = %d192-253 UTF8-xtra-tail = %d128-191 UTF8-xtra-char = UTF8-xtra-head 1*UTF8-xtra-tail text = %d1-9 / ; all UTF-8 characters except %d11-12 / ; US-ASCII NUL, CR and LF %d14-127 / UTF8-xtra-char ctext = NO-WS-CTL / ; all of except %d33-39 / ; SP, HTAB, "(", ")" %d42-91 / ; and "\" %d93-126 / UTF8-xtra-char C. H. Lindsey [Page 10] News Article Format April 2001 qtext = NO-WS-CTL / ; all of except %d33 / ; SP, HTAB, "\" and DQUOTE %d35-91 / %d93-126 / UTF8-xtra-char utext = NO-WS-CTL / ; Non white space controls %d33-126 / ; The rest of US-ASCII UTF8-xtra-char strict-text = %d1-9 / ; text restricted to %d11-12 / ; US-ASCII %d14-127 strict-qtext = NO-WS-CTL / ; qtext restricted to %d33 / ; US-ASCII %d35-91 / %d93-127 strict-quoted-pair = "\" strict-text strict-quoted-string = [CFWS] DQUOTE *([FWS] (strict-qtext / strict-quoted-pair)) [FWS] DQUOTE [CFWS] NOTE: There are sequences of octets which cannot legitimately occur in UTF-8, even a few permitted by the above syntax. These SHOULD NOT be generated by posting agents but, where they occur inadavertently, they SHOULD be passed on untouched by other agents. Wherever in this standard the syntax is stated to be taken from [MESSFOR], it is to be understood as the syntax defined by [MESSFOR] after making the above changes, but NOT including any syntax defined in section 4 ("Obsolete syntax") of [MESSFOR]. Software compliant with this standard MUST NOT generate any of the syntactic forms defined in that Obsolete Syntax, although it MAY accept such syntactic forms. Certain syntax from the MIME specifications [RFC 2045] et seq is also considered a part of this standard (see 6.21). The following syntactic forms, taken from [RFC 2234] or from [MESSFOR], are repeated here for convenience only: ALPHA = %x41-5A / ; A-Z %x61-7A ; a-z CR = %x0D ; carriage return CRLF = CR LF DIGIT = %x30-39 ; 0-9 HTAB = %x09 ; horizontal tab LF = %x0A ; line feed SP = %x20 ; space NO-WS-CTL = %d1-8 / ; US-ASCII control characters %d11 / ; which do not include the %d12 / ; carriage return, line feed, %d14-31 / ; and whitespace characters %d127 WSP = SP / HTAB ; Whitespace characters C. H. Lindsey [Page 11] News Article Format April 2001 FWS = ([*WSP CRLF] 1*WSP); Folding whitespace atext = ALPHA / DIGIT / "!" / "#" / ; Any character except "$" / "%" / ; controls SP, and specials. "&" / "'" / ; Used for atoms "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "}" / "|" / "}" / "~" atom = [CFWS] 1*atext [CFWS] dot-atom = [CFWS] dot-atom-text [CFWS] dot-atom-text = 1*atext *( "." 1*atext ) comment = "(" *([FWS] (ctext / quoted-pair / comment)) [FWS] ")" CFWS = *([FWS] comment) (([FWS] comment) / FWS ) DQUOTE = %d34 ; quote mark quoted-pair = "\" text quoted-string = [CFWS] DQUOTE *([FWS] (qtext / quoted-pair)) [FWS] DQUOTE [CFWS] unstructured = *( [FWS] utext ) [FWS] NOTE: CFWS occurs at many places in the syntax in order to allow comments and extra whitespace to be inserted almost anywhere. The syntax is in fact ambiguous insofar as it may be impossible to tell in which of several possible ways a given comment or WS was produced. However, this does not lead to semantic ambiguity because, unless specifically stated otherwise, the presence of absence of a comment or additional WS has no semantic meaning and, in particular, it is a matter of indifference whether it forms a part of the syntactic construct preceding it or the one following it. NOTE: Following [RFC 2234], literal text included in the syntax is to be regarded as case-insensitive. However, in contradistinction to [MESSFOR], the Netnews protocols are sensitive to case in some instances (as in newsgroup names, some header parameters, etc.). Care has been taken to indicate this explicitly where required. The complete syntax defined in this standard is repeated, for convenience, in Appendix B. 2.5. Language Various constant strings in this standard, such as header names and month names, are derived from English words. Despite their derivation, these words do NOT change when the poster or reader employing them is interacting in a language other than English. Posting and reading agents MAY translate as appropriate in their interaction with the poster or reader, but the forms that actually C. H. Lindsey [Page 12] News Article Format April 2001 appear in articles MUST be the English-derived ones defined in this standard. 3. Changes to the existing protocols This standard prescribes many changes, clarifications and new features since the protocols described in [RFC 1036] and [Son-of- 1036]. It is the intention that they can be assimilated into Usenet as it presently operates without major interruption to the service, though some of the new features may not begin to show benefit until they become widely implemented. This section summarizes the main changes, and comments on some features of the transition. 3.1. Principal Changes o The [MESSFOR] conventions for parenthesis-enclosed comments in headers are supported. o Whitespace is permitted in Newsgroups headers, permitting folding of such headers. Indeed, all news headers can now be folded. o An enhanced syntax for the Path header enables the injection point of and the route taken by an article to be determined with certainty. o Netnews is firmly established as an 8bit medium. o Large parts of MIME are recognised as an integral part of Netnews. o The charset for headers is always UTF-8. This will, inter alia, permit newsgroup-names with non-ASCII characters. o There is a new Control command 'mvgroup' to facilitate moving a group to a different place (name) in a hierarchy. o There are several new headers defined, such as Replaces and Author-Ids, leading to increased functionality. o There are numerous other small changes, clarifications and enhancements. [Doubtless many other changes should be listed, but there is little point in doing so until our text is nearing completion. The above gives the flavour of what should be said. There should also be references to Appendix A.3 and Appendix A.4 ] 3.2. Transitional Arrangements An important distinction must be made between serving and relaying agents which are responsible for the distribution and storage of news articles, and user agents which are responsible for interactions with users. It is important that the former should be upgraded to conform to this standard as soon as possible to provide the benefit of the enhanced facilities. Fortunately, the number of distinct implementations of such agents is rather small, at least so far as the main "backbone" of Usenet is concerned, and many of the new features are already supported. Contrariwise, there are a great number of implementations of user agents, installed on a vastly greater number of small sites. Therefore, the new functionality has been designed so that existing agents may continue to be used, although the full benefits may not be realised until a substantial proportion of them have been upgraded. C. H. Lindsey [Page 13] News Article Format April 2001 In the list which follows, care has been taken to distinguish the implications for both kinds of agent. o [MESSFOR] style comments in headers do not affect serving and relaying agents (note that the Newsgroups and Path headers do not contain them). They are unlikely to hinder their proper display in existing user agents except in the case of the References header in agents which thread articles. Therefore, it is provided that they SHOULD NOT be generated except where permitted by the previous standards. o Because of its importance to all serving agents, the extension permitting whitespace and folding in Newsgroup headers SHOULD NOT be used until it has been widely deployed amongst relaying agents. User agents are unaffected. o The new style of Path header is already consistent with the previous standards. However, the intention is that relaying agents should eventually reject articles in the old style, and so this should be offered as a configurable option for relaying agents. User agents are unaffected. o The vast majority of serving, relaying and transport agents are believed to be already 8bit clean (in the slightly restricted sense in which that term is used in the MIME standards). User agents that do not implement MIME may be disadvantaged, but no more so than at present when faced with 8bit characters (which currently abound in spite of the previous standards). o The introduction of MIME reflects a practice that is already widespread. Articles in strict compliance with the previous standards (using strict US-ASCII) will be unaffected. Many user agents already support it, at least to the extent of widely used charsets such as ISO-8859-1. Users expecting to read articles using the more exotic charsets will need to acquire suitable reading agents. It is not intended, in general, that any single user agent will be able to display every charset known to IANA, but all such agents MUST support US-ASCII. Serving and relaying agents are not affected. o The use of the UTF-8 charset for headers will not affect any existing usage, since US-ASCII is a strict subset of UTF-8. Insofar as newsgroup names containing non-ASCII characters can now be expected to arise, support from serving and relaying agents will be necessary. It is believed that the customary storage structure used by serving agents can already cope (perhaps not ideally) with such names. Note that it is not necessary for serving and relaying agents to understand all the characters available in UTF-8, though it is desirable for them to be displayable for diagnostic purposes via some escape mechanism using, for example, the visible subset of US-ASCII. For users expecting to use the more exotic charsets available under UTF-8, the remarks already made in connection with MIME will apply. o The new Control: mvgroup command will need to be implemented in serving agents. It SHOULD be used in conjunction with pairs of matching rmgroup and newgroup commands (injected shortly after the mvgroup) until such time as mvgroup is widely implemented. The new Replaces header is also effectively a Control command, and transitional arrangements are provided which should be used C. H. Lindsey [Page 14] News Article Format April 2001 in the meantime. User agents are unaffected. o The headers newly introduced by this standard can safely be ignored by existing software, albeit with loss of the new functionality. 4. Basic Format 4.1. Syntax of News Articles The overall syntax of a news article is: article = 1*header separator body header = header-name ":" 1*SP header-content CRLF header-name = 1*name-character *( "-" 1*name-character ) name-character = ALPHA / DIGIT header-content = USENET-header-content *( [CFWS] ";" header-parameter ) / other-header-content USENET-header-content = other-header-content = header-parameter = USENET-header-parameter / other-header-parameter USENET-header-parameter = other-header-parameter = attribute "=" value attribute = USENET-token / iana-token / x-token value = token / quoted-string USENET-token = iana-token = x-token = [CFWS] "x-" token-core [CFWS] token = [CFWS] token-core [CFWS] token-core = 1* tspecials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\" / DQUOTE / "/" / "[" / "]" / "?" / "=" separator = CRLF body = *( *998text CRLF ) An article consists of some headers followed by a body. An empty line separates the two. The headers contain structured information about the article and its transmission. A header begins with a header-name C. H. Lindsey [Page 15] News Article Format April 2001 identifying it, and can be continued onto subsequent lines as described in section 4.2.3. The body is largely unstructured text significant only to the poster and the readers. NOTE: Terminology here follows the current custom in the news community, rather than the [MESSFOR] convention of referring to what is here called a "header" as a "header-field" or "field". Note that the separator line must be truly empty, not just a line containing white space. Further empty lines following it are part of the body, as are empty lines at the end of the article. NOTE: The syntax above defines the canonical form of a news article as a sequence of lines each terminated by CRLF. This does not prevent serving agents or transport agents from storing or handling the article in other formats (e.g. using a single LF in place of CRLF) so long as the overall effects achieved are as defined by this standard when operating on the canonical form. 4.2. Headers 4.2.1. Names and Contents Despite the restrictions on header-name syntax imposed by the grammar, relayers and reading agents SHOULD tolerate header names containing any US-ASCII printable character other than colon (":", ASCII 58). [To bring it into line with as given in [MESSFOR].] Header-names SHOULD be either those for which a USENET-header-content is defined in this standard, or those defined in [MESSFOR], or those defined in any extension to either of these standards including, in particular, the Mime standards [RFC 2045] et seq., or experimental headers beginning with "X-" (as defined in 4.2.2.1). Software SHOULD NOT attempt to interpret headers not described in this standard or in its extensions, but relaying agents MUST pass them on unaltered and reading agents MUST enable them to be displayed, at least optionally. The possibility of allowing header-parameters to appear in all headers is provided mainly for the purpose of allowing future extensions to existing headers, since only a very few USENET-header- parameters are actually defined in this standard. Observe that such header-parameters do not, in general, occur in headers defined in other standards, except for the Mime standards [RFC 2045] et seq. and their extensions. Nevertheless, compliant software MUST accept all such header-parameters in headers defined by this standard and its extensions (ignoring them if their meaning is unknown) and SHOULD accept (and ignore) them in all headers. [but what about address = mailbox / group group = phrase ":" [mailbox-list] ";" Does the following NOTE cover the situation?] C. H. Lindsey [Page 16] News Article Format April 2001 NOTE: The presence of a ";" in a header-content does not indicate the presence of a header-parameter in the few situations where it can be parsed as part of some USENET- header-content or other-header-content. On the other hand, posting agents SHOULD NOT generate them (even those using x-tokens) except in those headers for which a USENET- header-parameter has been defined, or where that usage is permitted by some other standard (notably one of the Mime standards). This restriction is likely to removed in a future version of this standard. NOTE: The given syntax is ambiguous insofar as a USENET-header- content that is defined to be could contain, within that , text of the form <*(";" header- parameter)>. The intention is therefore that any such apparent header-parameters are to be regarded as part of the . This standard therefore does not (and extensions to it SHOULD NOT) define any USENET-header-parameter to be associated with such an unstructured USENET-header-content. The order of headers in an article is not significant. However, posting agents are encouraged to put mandatory headers (section 5) first, followed by optional headers (section 6), followed by experimental headers and headers not defined in this standard or its extensions. Relaying agents MUST NOT change the order of the headers in an article, though they MAY add additional headers (notablt local headers (4.2.2.3), preferably either before or after all the existing ones. Header-names are case-insensitive. There is a preferred case convention, which posters and posting agents SHOULD use: each hyphen-separated "word" has its initial letter (if any) in uppercase and the rest in lowercase, except that some abbreviations have all letters uppercase (e.g. "Message-ID" and "MIME-Version"). The forms used in this standard are the preferred forms for the headers described herein. Relaying and reading agents MUST, however, tolerate articles not obeying this convention. 4.2.2. Header Properties There are four special properties that may apply to particular headers, namely: "experimental", "inheritable", "local", and "variant". When a header is defined, in this (or any future) standard, as having one (or possibly more) of these properties, it is subject to special treatment, as indicated below. 4.2.2.1. Experimental Headers Experimental headers are those whose header-names begin with "X-". They are to be used for experimental Netnews features, or for enabling additional material to be propagated with an article. There are no established headers that are considered experimental headers; an established header cannot be experimental. C. H. Lindsey [Page 17] News Article Format April 2001 NOTE: Some such headers may eventually be adopted as standard by some extension to this standard, at which point they will lose their "X-" prefix. 4.2.2.2. Inheritable Headers Subject only to the overriding ability of the poster to determine the contents of the headers in a proto-article, headers with the inheritable property MUST be copied by followup agents (perhaps with some modification) into the followup article, and headers without that property MUST NOT be so copied. Examples include: o Newsgroups (5.5) - copied from the precursor, subject to any Followup-To header. o Subject (5.4) - modified by prefixing with "Re: ", but otherwise copied from the precursor. o References (6.10) - copied from the precursor, with the addition of the precursor's Message-ID. o Distribution (6.6) - copied from the precursor. NOTE: The Keywords header is not inheritable, though some older newsreaders treated it as such. 4.2.2.3. Local Headers Headers with the local property are significant only to a particular serving agent (or perhaps a cooperating group of such agents). They MAY be removed by relaying agents before propagation, and MUST be removed (and replaced as necessary) by serving agents when received. The replaced header MAY be placed anywhere within the headers (though placing it first is recommended). The principle example is: o Xref (6.16) - used to keep track of the article locators of crossposted articles so that newsreaders can mark such articles as read. 4.2.2.4. Variant Headers Headers with the variant property are modified as articles are propagated. The modified header MAY be placed anywhere within the headers (though placing it first is recommended). The principle example is: o Path (5.6) - augmented at each relaying agent that an article passes through. 4.2.3. White Space and Continuations [The following text is taken from [MESSFOR], adapted to the different terminology used for this standard.] Each header is logically a single line of characters comprising the header-name, the colon with its following SP, and the header-content. For convenience, however, the header-content can be split into a multiple line representation; this is called "folding". The general rule is that wherever this standard allows for FWS or CFWS (but not simply SP or HTAB) a CRLF may be inserted before any WSP. For C. H. Lindsey [Page 18] News Article Format April 2001 example, the header: Approved: modname@modsite.example (Moderator of comp.foo.bar) can be represented as: Approved: modname@modsite.example (Moderator of comp.foo.bar) NOTE: Though header-contents are defined in such a way that folding can take place between many of the lexical tokens (and even within some of them), folding SHOULD be limited to placing the CRLF at higher-level syntactic breaks, and SHOULD also avoid leaving trailing WSP on the preceding line. For instance, if a header-content is defined as comma-separated values, it is recommended that folding occur after the comma separating the structured items, even if it is allowed elsewhere. Folding MUST NOT be carried out in such a way that any line of a header is made up entirely of WSP characters and nothing else. The colon following the header name on the first line MUST be followed by a WSP, even if the header is empty. If the header is not empty, at least some of the content MUST appear on the first line (this is to avoid the possibility of harm by any non-compliant agent that might eliminate a trailing SP). Posting agents MUST enforce these restrictions, but relaying agents SHOULD accept even articles that violate them. NOTE: This standard differs from [MESSFOR] in requiring that WSP followng the colon (it was also an [RFC 1036] requirement). Posters and posting agents SHOULD use SP, not HTAB, where white space is desired in headers (some existing software expects this), and MUST use SP immediately following the colon after a header-name. Relaying agents SHOULD accept HTAB in all such cases, however. Since the white space beginning a continuation line remains a part of the logical line, headers can be "broken" into multiple lines only at FWS or CFWS. Posting agents Ought Not to break headers unnecessarily (but see 4.5 4.2.4. Comments Strings of characters which are treated as comments may be included in header-contents wherever the syntactic element CFWS occurs. They consist of characters enclosed in parentheses. Such strings are considered comments so long as they do not appear within a quoted- string. Comments may be nested. A comment is normally used to provide some human readable informational text, except at the end of an address which contains no phrase, as in fred@foo.bar.example (Fred Bloggs) as opposed to "Fred Bloggs" . C. H. Lindsey [Page 19] News Article Format April 2001 The former is a deprecated, but commonly encountered, usage and reading agents SHOULD take special note of such comments as indicating the name of the person whose address it is. In all other situations a comment is semantically interpreted as a single SP. Since a comment is allowed to contain FWS, folding is permitted within it as well as immediately preceding and immediately following it. Also note that, since quoted-pair is allowed in a comment, the parenthesis and backslash characters may appear in a comment so long as they appear as a quoted-pair. Semantically, the enclosing parentheses are not part of the comment content; the content is what is contained between the two parentheses. Since comments have not hitherto been permitted in news articles, except in a few specified places, posters and posting-agents SHOULD NOT insert them except in those places, namely following addresses in From and similar headers, and to indicate the name of the timezone in Date headers. However, compliant software MUST accept them in all places where they are syntactically allowed. 4.2.5. Undesirable Headers A header whose content is empty is said to be an empty header. Relaying and reading agents SHOULD NOT consider presence or absence of an empty header to alter the semantics of an article (although syntactic rules, such as requirements that certain header names appear at most once in an article, MUST still be satisfied). Posting and injecting agents SHOULD delete empty headers from articles before posting them; relaying agents MUST pass them untouched. Headers that merely state defaults explicitly (e.g., a Followup-To header with the same content as the Newsgroups header, or a Mime Content-Type header with contents "text/plain; charset=us-ascii") or state information that reading agents can typically determine easily themselves (e.g. the length of the body in octets) are redundant and posters and posting agents Ought Not to include them. 4.3. Body 4.3.1. Body Format Issues The body of an article SHOULD NOT be empty. A posting or injecting agent which does not reject such an article entirely SHOULD at least issue a warning message to the poster and supply a non-empty body. Note that the separator line MUST be present even if the body is empty. NOTE: Some existing news software is known to react badly to body-less articles, hence the request for posting and injecting agents to insert a body in such cases. The sentence "This article was probably generated by a buggy news reader" has traditionally been used is this situation. C. H. Lindsey [Page 20] News Article Format April 2001 Note that an article body is a sequence of lines terminated by CRLFs, not arbitrary binary data, and in particular it MUST end with a CRLF. However, relaying agents SHOULD treat the body of an article as an uninterpreted sequence of octets (except as mandated by changes of CRLF representation and by control-message processing) and SHOULD avoid imposing constraints on it. See also section 4.5. Posters SHOULD avoid using control characters and escape sequences except for tab (ASCII 9), formfeed (ASCII 12) and, possibly, backspace (ASCII 8). Tab signifies sufficient horizontal white space to reach the next of a set of fixed positions; posters are warned that there is no standard set of positions, so tabs should be avoided if precise spacing is essential. Formfeed (which is sometimes referred to as the "spoiler character") signifies a point at which a reading agent Ought to pause and await reader interaction before displaying further text. Reading agents MUST NOT pass other control characters or escape sequences unaltered to an output device. NOTE: Backspace was historically used for underlining, done by an underscore (ASCII 95), a backspace, and a character, repeated for each character that should be underlined. Posters are warned that underlining is not available on all output devices or supported by all reading agents and is best not relied on for essential meaning. 4.3.2. Body Conventions A body is by default an uninterpreted sequence of octets for most of the purposes of this standard. However, a Mime Content-Type header may impose some structure or intended interpretation upon it, and may also specify the character set in accordance with which the octets are to be interpreted. It is a common practice for followup agents to enable the incorporation of the followed-up article (the "precursor") as a quotation. This SHOULD be done by prefacing each line of the quoted text (even if it is empty) with the character ">" (or perhaps with "> " in the case of a previously unquoted line). This will result in multiple levels of ">" when quoted content itself contains quoted content, and it will also facilitate the automatic analysis of articles. NOTE: Posters should edit quoted context to trim it down to the minimum necessary. However, followup agents Ought Not to attempt to enforce this beyond issuing a warning (past attempts to do so have been found to be notably counter-productive). The followup agent SHOULD also precede the quoted content by an "attribution line" (however, readers are warned not to assume that they are accurate, especially within multiply nested quotations). The following convention for such lines, whilst not mandated by this standard, is intended to facilitate their automatic recognition and processing by sophisticated reading agents. The attribution SHOULD contain the name or the email address of the precursor's poster, as C. H. Lindsey [Page 21] News Article Format April 2001 in Joe D. Bloggs wrote: or Helmut Schmidt schrieb: The attribution MAY contain also a single Newsgroup name (the one from which the followup is being made), the precursor's Message-ID and/or the precursor's Date and Time. Any of these that are present, SHOULD precede the name and/or email address. However, the inclusion or not of such fields Ought always to be under the control of the poster. To enable this line, and the Message-ID and the Email address within it, to be recognised (for example to enable suitable reading agents to retrieve the precursor or email its poster by clicking on them), the following conventions SHOULD be observed: o The precursor's Message-ID SHOULD be enclosed within <...> or o The precursor's poster's Email address SHOULD be enclosed within <...> o The various fields may be separated by arbitrary text and they may be folded in the same way as headers, but attributions SHOULD always be terminated by a ":" followed by CRLF. Further examples: On comp.foo in <1234@bar.example> on 24 Dec 1997 16:40:20 +0000, Joe D. Bloggs wrote: Am 24. Dez 1997 schrieb Helmut Schmidt : A "personal signature" is a short closing text automatically added to the end of articles by posting agents, identifying the poster and giving his network addresses, etc. If a poster or posting agent does append such a signature to an article, it MUST be preceded with a delimiter line containing (only) two hyphens (ASCII 45) followed by one SP (ASCII 32). The signature is considered to extend from the last occurrence of that delimiter up to the end of the article (or up to the end of the part in the case of a multipart Mime body). Followup agents, when incorporating quoted text from a precursor, Ought Not to include the signature in the quotation. Posting agents Ought to discourage (at least with a warning) signatures of excessive length (4 lines is a commonly accepted limit). 4.4. Characters and Character Sets Transmission paths for news articles MUST treat news articles as uninterpreted sequences of octets, excluding the values 0 (ASCII NUL) and 13 and 10 (ASCII CR and LF, which MUST ONLY appear in the combination CRLF which denotes a line separator). NOTE: this correspponds to the range of octets permitted for Mime "8bit data" [RFC 2045]. Thus raw binary data cannot be transmitted in an article body except by the use of a Content- C. H. Lindsey [Page 22] News Article Format April 2001 Transfer-Encoding such as base64. Character data is represented by octets in accordance with some encoding scheme (UTF-8 for headers, and determined by the Content- Type and Content-Transfer-Encoding headers for bodies). If it comes to a relaying agent's attention that it is being asked to pass an article using the Content-Transfer-Encoding "8bit" to a relaying agent that does not support it, it SHOULD report this error to its administrator. It MUST refuse to pass the article and MUST NOT re-encode it with different Mime encodings. NOTE: This strategy will do little harm. The target relaying agent is unlikely to be able to make use of the article on its own servers, and the usual flooding algorithm will likely find some alternative route to get the article to destinations where it is needed. 4.4.1. Character Sets within Article Headers Within article headers, characters are represented as octets according to the UTF-8 encoding scheme [ISO 10646] or [RFC 2279] and hence all the characters in the Universal Multiple-Octet Coded Character Set (UCS) [ISO 10646] (which is essentially a superset of Unicode [UNICODE] and expected to remain so) are potentially available. However, interpreting the octets directly as US-ASCII characters should ensure correct behaviour in most situations. NOTE: UTF-8 is an encoding for 16bit (and even 32bit) character sets with the property that any octet less than 128 immediately represents the corresponding US-ASCII character, thus ensuring upwards compatibility with previous practice. Non-ASCII characters from UCS are represented by sequences of octets satisfying the syntax of a UTF8-xtra-char (2.4). Only those octet sequences explicitly permitted by [RFC 2044] shall be used. UCS includes all characters from the ISO-8859 series of characters sets [ISO 8859] (which includes all Greek and Arabic characters) as well as the more elaborate characters used in Japan and China. See the following section for the appropriate treatment of UCS characters by reading agents. Notwithstanding the great flexibility permitted by UTF-8, there is need for restraint in its use in order that the essential components of headers may be discerned using reading agents that cannot present the full UCS range. In particular, header-names and tokens MUST be in US-ASCII, and certain other components of headers, as defined elsewhere in this standard - notably msg-ids, date-times, dot-atoms, domains and path-identities - MUST be in US-ASCII. Comments, phrases (as in addresses) and unstructureds (as in Subject headers) MAY use the full range of UTF-8 characters. For newsgroup-names see 5.5. Where the use of non-ASCII characters, encoded in UTF-8, is permitted as above, they MAY also be encoded using the Mime mechanism defined in [RFC 2047], but this usage is deprecated within news articles C. H. Lindsey [Page 23] News Article Format April 2001 (even though it is required in mail messages) since it is less legible in older reading agents which support neither it nor UTF-8. Nevertheless, reading agents SHOULD support this usage, but only in those contexts explicitly mentioned in [RFC 2047]. 4.4.2. Character Sets within Article Bodies Within article bodies, characters are represented as octets according to the encoding scheme implied by any Content-Transfer-Encoding and Content-Type headers [RFC 2045]. In the absence of such headers, reading agents cannot be relied upon to display correctly more than the US-ASCII characters. NOTE: Observe that reading agents are not forbidden to "guess", or to interpret as UTF-8 regardless, which would be the simplest course for them to take. NOTE: It is not expected that reading agents will necessarily be able to present characters in all possible character sets, although they MUST be able to present all US-ASCII characters. For example, a reading agent might be able to present only the ISO-8859-1 (Latin 1) characters [ISO 8859], in which case it Ought to present undisplayable characters using some distinctive glyph, or by exhibiting a suitable warning. Older reading agents that do not understand Mime headers or UTF-8 should be able to display bodies in US-ASCII (with some loss of human comprehensibility) except possibly when the Content-Transfer- Encoding is "8bit". Followup agents MUST be careful to apply appropriate encodings to the outbound followup. A followup to an article containing non-ASCII material is very likely to contain non-ASCII material itself. 4.5. Size Limits Posting agents SHOULD endeavour to keep all header lines, so far as is possible, within 79 characters by folding them at suitable places (see 4.2.3). However, posting agents MUST permit the poster to include longer headers if he so insists, and compliant software MUST support headers of at least 998 octets. Likewise, injecting agents SHOULD fold any headers generated automatically by themselves. Relaying agents MUST NOT fold headers (i.e. they must pass on the folding as received). NOTE: There is NO restriction on the number of lines into which a header may be split, and hence there is NO restriction on the total length of a header (in particular it may, by suitable folding, be made to exceed the 998 octets restriction pertaining to a single header line). The syntax provides for the lines of a body to be up to 998 octets in length, not including the CRLF. All software compliant with this standard MUST support lines of at least that length, both in headers and in bodies, and all such software SHOULD support lines of C. H. Lindsey [Page 24] News Article Format April 2001 arbitrary length. In particular, relaying agents MUST transmit lines of arbitrary length without truncation or any other modification. NOTE: The limit of 998 octets is consistent with the corresponding limit in [MESSFOR]. [RFC8222? now expresses it as 1000 incl. CRLF] In plain-text messages (those with no Mime headers, or those with a Mime Content-Type of text/plain) posting agents Ought to endeavour to keep the length of body lines within some reasonable limit. The size of this limit is a matter of policy, the default being to keep within 79 characters at most, and preferably within 72 characters (to allow room for quoting in followups). Exceptionally, posting agents Ought Not to adjust the length of quoted lines in followups unless they are able to reformat them in a consistent manner. Moreover, posting agents MUST permit the poster to include longer lines if he so insists. NOTE: Plain-text messages are intended to be displayed "as-is" without any special action (such as automatic line splitting) on the part of the recipient. The policy limit (e.g. 72 or 79) should be expressed as a number of characters (as they will be displayed by a reading agent) rather than as the number of octets used to encode them. NOTE: This standard provides no upper bound on the overall size of a single article, but neither does it forbid relaying agents from dropping articles of excessive length. It is, however, suggested that any limits thought appropriate by particular agents would be more appropriately expressed in megabytes than in kilobytes. 4.6. Example Here is a sample article: Path: server.example/unknown.site2.example@site2.example/ relay.site.example/site.example/injector.site.example%jsmith Newsgroups: example.announce,example.chat Message-ID: <9urrt98y53@site.example> From: Ann Example Subject: Announcing a new sample article. Date: Fri, 27 Mar 1998 12:12:50 +1300 Approved: example.announce moderator Followup-To: example.chat Reply-To: Ann Example Expires: Wed, 22 Apr 1998 12:12:50 -0700 Organization: Site1, The Number one site for examples. User-Agent: ExampleNews/3.14 (Unix) Keywords: example, announcement, standards, RFC 1036, Usefor Summary: The URL for the next standard. Just a quick announcemnt that a new standard example article has C. H. Lindsey [Page 25] News Article Format April 2001 been released; it is in the new USEFOR draft obtainable from ftp.ietf.org. Ann. -- Ann Example Sample Poster to the Stars "The opinions in this article are bloody good ones" - J. Clarke. 5. Mandatory Headers An article MUST have one, and only one, of each of the following headers: Date, From, Message-ID, Subject, Newsgroups, Path. Note also that there are situations, discussed in the relevant parts of section 6, where References, Sender, or Approved headers are mandatory. In control messages, specific values are required for certain headers. For the overall syntax of headers, see section 4.1. In the discussions of the individual headers, the content of each is specified using the syntax notation. The convention used is that the content of, for example, the Subject header is defined as . A proto-article (see 8.2.1) may lack some of these mandatory headers, but they MUST then be supplied by the injecting agent. 5.1. Date The Date header contains the date and time that the article was prepared by the poster ready for transmission and SHOULD express the poster's local time. The content syntax makes use of syntax defined in [MESSFOR]. Date-content = date-time NOTE: It is a useful convention to follow the date-time with a comment containing the time zone in human-readable form. The use of folding in a date-time is deprecated, even though permitted by [MESSFOR]. [There is a specific RECOMMENDED about this in MESSFOR now.] In order to prevent the reinjection of expired articles into the news stream, relaying and serving agents MUST refuse articles whose Date header predates the earliest articles of which they normally keep record, or which is more than 24 hours into the future (though they MAY use a margin less than that 24 hours). Relaying agents MUST NOT modify the Date header in transit. 5.1.1. Examples Date: Fri, 2 Apr 1999 20:20:51 -0500 (EST) Date: 26 May 1999 16:13 +0000 C. H. Lindsey [Page 26] News Article Format April 2001 5.2. From The From header contains the electronic address(es), and possibly the full name, of the article's author(s). The content syntax makes use of syntax defined in [MESSFOR], subject to the following revised definition of local-part. From-content = mailbox-list addr-spec = local-part "@" domain local-part = dot-atom / strict-quoted-string NOTE: This syntax ensures that the local-part of an addr-spec is restricted to pure US-ASCII (and is thus in strict compliance with [MESSFOR]), whilst allowing any UTF-8 character to be used in a preceding quoted-string containing the author's full name. If some future extension to the Mail protocols should relax this restriction, one would expect the Netnews protocols to follow. The mailbox in the From-content SHOULD be a valid address, belonging to the poster(s) of the article, or person or agent on whose behalf the post is being sent (see the Sender header, 6.2). When, for whatever reason, the poster does not wish to include such an adddress, the From-content SHOULD then be an address which ends in the top level domain of ".invalid" [RFC 2606]. NOTE: Since such addresses ending in ".invalid" are undeliverable, user agents Ought to warn any user attempting to reply to them and Ought Not, in any case, to attempt to deliver to them (since that would be pointless anyway). Whether or not a valid address can subsequently be extracted from such an address falls outside the scope of this standard (though it would be pointless to use a disguise so easily penetrable). Be warned also that some injecting agents that have authentication information may choose to replace the From- content based upon the authenticated identity. 5.2.1. Examples: From: John Smith From: "John Smith" , dave@isp.example From: "John D. Smith" , andrew@isp.example, fred@site2.example From: Jan Jones From: Jan Jones From: dave@isp.example (Dave Smith) NOTE: the last example shows a now deprecated convention of putting an author's full name in a comment following the mailbox, rather than in a phrase at the start of that mailbox. Observe that the quotes around the "John D. Smith" example were required, on account of the '.' character, and they would also have been required had any UTF8-xtra-char been present. C. H. Lindsey [Page 27] News Article Format April 2001 5.3. Message-ID The Message-ID header contains the article's message identifier, a unique identifier distinguishing the article from every other article. The content syntax makes use of syntax defined in [MESSFOR], subject to the following revised definition of no-fold-quote. Message-ID-content = msg-id id-left = dot-atom-text / no-fold-quote no-fold-quote = DQUOTE *( strict-qtext / strict-quoted-pair ) DQUOTE The msg-id MUST NOT be more than 250 octets in length. NOTE: The syntax ensures that a msg-id is restricted to pure US-ASCII (and is thus in strict compliance with [MESSFOR]). The length restriction ensures that systems which accept message identifiers as a parameter when retrieving an article (e.g. [NNTP]) can rely on a bounded length. Observe that msg-id includes the '<' and '>'. [Do something about whitespace in dot-atom-text and no-fold-quote.] Following the provisions of [MESSFOR], an agent generating an article's message identifier MUST ensure that it is unique and that it is NEVER reused (either in Netnews or email). Moreover, even though commonly derived from the domain name of the originating site (and domain names are case-insensitive), a message identifier MUST NOT be altered in any way during transport, or when copied (as into a References header), and thus a simple (case-sensitive) comparison of octets will always suffice to recognise that same message identifier wherever it subsequently reappears. NOTE: some old software may treat message identifiers that differ only in case within their id-right part as equivalent, and implementors of agents that generate message identifiers should be aware of this. 5.4. Subject The Subject header contains a short string identifying the topic of the message. This is an inheritable header (4.2.2.2) to be copied into the Subject header of any followup, in which case the new header-content SHOULD then default to the string "Re: " (a "back reference") followed by the contents of the pure-subject of the precursor. Any leading "Re: " in the pure-subject MUST be stripped. Subject-content = [ back-reference ] pure-subject pure-subject = 1*( [FWS] utext ) back-reference = %x52.65.3A.20 ; which is a case-sensitive "Re: " The pure-subject MUST NOT begin with "Re: ". C. H. Lindsey [Page 28] News Article Format April 2001 NOTE: The given syntax differs from that prescribed in [MESSFOR] insofar as it does not permit a header content to be completely empty, or to consist of WSP only (see remarks in 4.2.5 concerning undesirable headers). Followup agents MAY remove instances of non-standard back-reference (such as "Re(2): ", "Re:", "RE: ", or "Sv: ") from the Subject- content when composing the subject of a followup and add a correct back-reference in front of the result. NOTE: that would be "SHOULD remove instances" except that we cannot find a sufficiently robust and simple algorithm to do the necessary natural language processing. Followup agents MUST NOT use any other string except "Re: " as a back reference. Specifically, a translation of "Re: " into a local language or usage MUST NOT be used. NOTE: "Re" is an abbreviation for the Latin "In re", meaning "in the matter of", and not an abbreviation of "Reference" as is sometimes erroneously supposed. Agents SHOULD NOT depend on nor enforce the use of back references by followup agents. For compatibility with legacy news software the Subject-content of a control message (i.e. an article that also contains a Control header) MAY start with the string "cmsg ", and non-control messages MUST NOT start with the string "cmsg ". See also section 6.13. 5.4.1. Examples In the following examples, please note that only "Re: " is mandated by this standard. "was: " is a convention used by many English- speaking posters to signal a change in subject matter. Software should be able to deduce this information from References. Subject: Film at 11 Subject: Re: Film at 11 Subject: Godwin's law considered harmful (was: Film at 11) Subject: Godwin's law (was: Film at 11) Subject: Re: Godwin's law (was: Film at 11) 5.5. Newsgroups The Newsgroups header's content specifies the newsgroup(s) in which the article is intended to appear. It is an inheritable header (4.2.2.2) which then becomes the default Newsgroups header of any followup, unless a Followup-To header is present to prescribe otherwise. Newsgroups-content = newsgroup-name *( *FWS ng-delim *FWS newsgroup-name ) *FWS newsgroup-name = component *( "." component ) C. H. Lindsey [Page 29] News Article Format April 2001 component = component-start *( component-start / component-other ) component-start = Un-lowercase / Un-digit Un-lowercase = / Un-digit = / component-other = "+" / "-" / "_" ng-delim = "," where the items are as described in [UNICODE]. The inclusion of folding white space within a Newsgroups-content is a newly introduced feature in this standard. It MUST be accepted by all conforming implementations (relaying agents, serving agents and reading agents). Posting agents should be aware that such postings may be rejected by overly-critical old-style relaying agents. When a sufficient number of relaying agents are in conformance, posting agents SHOULD generate such whitespace in the form of so as to keep the length of lines in the relevant headers (notably Newsgroups and Followup-To) to no more than than 79 characters (or other agreed policy limit - see 4.5). Before such critical mass occurs, injecting agents MAY reformat such headers by removing whitespace inserted by the posting agent, but relaying agents MUST NOT do so. A newsgroup-name consists of one or more components. Components MAY contain non-ASCII letters, but these MUST be encoded in UTF-8 and not according to [RFC 2047]. A component MUST contain at least one letter (and MUST, according to the syntax, begin with a letter or digit). Components SHOULD begin with a letter. Composite characters (made by overlaying one character with another) and format characters, as allowed in certain parts of Unicode and needed by certain languages, must use whatever canonical conventions apply to those parts of Unicode (such conventions are not defined in this Standard). The use of "_" in a component is deprecated. Serving agents MAY refuse to accept newsgroups using such a component. NOTE: Components composed entirely of digits would cause problems for the commonly used implementation technique of using the component as the name of a directory, whilst also using sequential numbers to distinguish the articles within a group. Components containing other non-permitted characters could cause problems when newsgroup-names appear in URLs [RFC 1738] (for example an '@' character would prevent distinguishing between newsgroup-names and message identifiers). NOTE: According to the syntax, uppercase letters cannot occur in newsgroup-names, but this standard imposes no requirement on software to check this condition, since it would be unreasonable to expect it to do so in parts of Unicode for which it was not configured (in general, a table lookup is required). Rather, it is the responsibility of those creating new newsgroups (7.1) not to violate it. It is, moreover, to be expected that a newsgroup created in violation of this condition will not be propagated C. H. Lindsey [Page 30] News Article Format April 2001 particularly well. Whilst there is no longer any technical reason to limit the length of a component (formerly, it was limited to 14 characters) nor to limit the total length of a newsgroup-name, it should be noted that these names are also used in the newsgroups line (7.1.2) where an overall policy limit applies, and moreover excessively long names can be exceedingly inconvenient in practical use. Agencies responsible for individual hierarchies Ought therefore, as a matter of policy, to set reasonable limits for the length of a component and of a newsgroup- name. In the absence of such explicit policies, the default limits are 30 characters and 71 characters respectively. [If the checkpolicies proposal is included in the Standard, there should be a reference to it here.] NOTE: The newsgroup-name as encoded in UTF-8 should be regarded as the canonical form. Reading agents may convert it to whatever character set they are able to display (see 4.4.1) and serving agents may possibly need to convert it to some form more suitable as a filename. Simple algorithms for both kinds of conversion are readily available. Observe that the syntax does not allow comments within the Newsgroups header; this is to simplify processing by relaying and serving agents which have a requirement to process this header extremely rapidly. Posters SHOULD use only the names of existing newsgroups in the Newsgroups header. However, it is legitimate to cross-post to newsgroup(s) which do not exist on the posting agent's host, provided that at least one of the newsgroups DOES exist there, and followup agents SHOULD accept this (posting agents MAY accept it, but Ought at least to alert the poster to the situation and request confirmation). Relaying agents MUST NOT rewrite Newsgroups headers in any way, even if some or all of the newsgroups do not exist on the relaying agent's host. Serving agents MUST NOT create new newsgroups simply because an unrecognised newsgroup-name occurs in a Newsgroups header (see 7.1 for the correct method of newsgroup creation). The Newsgroups header is intended for use in Netnews articles rather than in mail messages. It MAY be used in a mail message to indicate that it is a copy also posted to the listed newsgroups, but it SHOULD NOT be used in a mail-only reply to a Netnews article (thus the "inheritable" property of this header applies only to followups to a newsgroup, and not to followups to the poster). Moreover, if a newsgroup-name contains any non-ASCII character, it MAY be encoded using the mechanism defined in [RFC 2047] when sent by mail but, if it is subsequently returned to the Netnews environment, it MUST then be re-encoded into UTF-8. 5.5.1. Forbidden newsgroup names The following forms of newsgroup-name MUST NOT be used except for the specific purposes indicated: C. H. Lindsey [Page 31] News Article Format April 2001 o Newsgroup-names having only one component. These are reserved for newsgroups whose propagation is restricted to a single host or local network, and for pseudo-newsgroups such as "poster" (which has special meaning in the Followup-To header - see section 6.7), "junk" (often used by serving agents), "control" (likewise), "revise" and "repost" (which have special meanings in the Xref header - see 6.16) o Any newsgroup-name beginning with "control." (used as pseudo- newsgroups by many serving agents) o Any newsgroup-name containing the component "ctl" (likewise) o "to" or any newsgroup-name beginning with "to." (reserved for the ihave/sendme protocol described in section 7.6, and for test messages sent on an essentially point-to-point basis) o Any newsgroup-name containing the component "all" (because this is used as a wildcard in some implementations) A newsgroup-name SHOULD NOT appear more than once in the Newsgroups header. The order of newsgroup names in the Newsgroups header is not significant, except for determining which moderator to send the article to if one of the groups is moderated (see 8.2). 5.6. Path The Path header shows the route taken by a message since its entry into the Netnews system. It is a variant header (4.2.2.4), each agent that processes an article being required to add one (or more) entries to it. This is primarily to enable relaying agents to avoid sending articles to sites already known to have them, in particular the site they came from, and additionally to permit tracing the route articles take in moving over the network, and for gathering Usenet statistics. Finally the presence of a '%' delimiter in the Path header can be used to identify an article injected in conformance with this standard. 5.6.1. Format Path-content = *( path-identity [FWS] delimiter [FWS] ) tail-entry *FWS path-identity = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" ) delimiter = "/" / "?" / "%" / "," / "!" tail-entry = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" ) NOTE: A Path-content will inevitably contain at least one path- identity, except possibly in the case of a proto-article that has not yet been injected onto the network. NOTE: Observe that the syntax does not allow comments within the Path header; this is to simplify processing by relaying and injecting agents which have a requirement to process this header extremely rapidly. C. H. Lindsey [Page 32] News Article Format April 2001 A relaying agent SHOULD NOT pass an article to another relaying agent whose path-identity (or some known alias thereof) already appears in the Path-content. Since the comparison may be either case sensitive or case insensitive, relaying agents SHOULD NOT generate a name which differs from that of another site only in terms of case. A relaying agent MAY decline to accept an article if its own path- identity is already present in the Path-content or if the Path- content contains some path-identity whose articles the relaying agent does not want, as a matter of local policy. NOTE: This last facility is sometimes used to detect and decline control messages (notably cancel messages) which have been deliberately seeded with a path-identity to be "aliased out" by sites not wishing to act upon them. 5.6.2. Adding a path-identity to the Path header When an injecting, relaying or serving agent receives an article, it MUST prepend its own path-identity followed by a delimiter to the beginning of the Path-content. In addition, it SHOULD then add CRLF and WSP if it would otherwise result in a line longer than 79 characters. The path-identity added MUST be unique to that agent. To this end it SHOULD be one of: 1. A fully qualified domain name (FQDN) associated (by the Internet DNS service [RFC 1034]) with an A record, which SHOULD identify the actual machine prepending this path-identity. Ideally, this FQDN should also be "mailable" in the sense that it enables the construction of a valid E-mail address of the form "usenet@" or "news@" [RFC 2142] whereby the administrators of that agent may be reached. 2. A fully qualified domain name (FQDN) associated (by the Internet DNS service) with an MX record which MUST then enable the construction of a valid E-mail address of the form "usenet@" or "news@" whereby the administrators of that agent may be reached. 3. A name registered previously in the UUCP maps database (found in the newsgroup comp.mail.maps), containing no '.' character. 4. An encoding of an IP address - [RFC 820] or [RFC 2373] (the requirement to be able to use an is the reason for including ':' as an allowed character within a path-identity). 5. A '.' followed by an arbitrary name not in the UUCP maps database, but believed to be unique and registered at least with all sites immediately downstream from the given site. C. H. Lindsey [Page 33] News Article Format April 2001 Of the above options, nos. 1 to 3 are much to be preferred, unless there are strong technical reasons dictating otherwise. In particular, the injecting agent's path-identity MUST, as a special case, be an FQDN mailable address in the sense defined under option 1, or with an associated MX record as in option 2. The injecting agent's path-identity MUST be followed by the special delimiter '%' which serves to separate the pre-injection and post- injection regions of the Path-content (see 5.6.3). In the case of a relaying or serving agent, the delimiter is chosen as follows. When such an agent receives an article, it MUST establish the identity of the source and compare it with the leftmost path-identity of the Path-content. If it matches, a '/' should be used as the delimiter when prepending the agent's own path-identity. If it does not match then the agent should prepend two entries to the Path-content; firstly the true established path-identity of the source followed by a '?' delimiter, and then, to the left of that, the agent's own path-identity followed by a '/' delimiter as usual. This prepending of two entries SHOULD NOT be done if the provided and established identities match. Any method of establishing the identity of the source may be used (but see 5.6.5 below), with the consideration that, in the event of problems, the agent concerned may be called upon to justify it. NOTE: The use of the '%' delimiter marks the position of the injecting agent in the chain. In normal circumstances there should therefore be only one `%` delimiter present, and injecting agents MAY choose to reject proto-articles with a '%' already in them. If, for whatever reason, more than one '%' is found, then the path-identity in front of the leftmost '%' is to be regarded as the true injecting agent. 5.6.3. The tail-entry For historical reasons, the tail-entry (i.e. the rightmost entry in the Path-content) is regarded as a "user name", and therefore MUST NOT be interpreted as a site through which the article has already passed. Moreover, the Path-content is not an E-mail address and MUST NOT be used to contact the poster. Posting and/or injecting agents MAY place any string here. When it is not an actual user name, the string "not-for-mail" is often used, but in fact a simple "x" would be sufficient. Often this field will be the only entry in the region (known as the pre-injection region) after the '%', although there may be entries corresponding to machines traversed between the posting agent and the injecting agent proper. In particular, injecting agents that receive articles from many sources MAY include information to establish the circumstances of the injection such as the identity of the source machine (especially if the Injector-Info header (6.19) is absent). Any such inclusion SHOULD NOT conflict with any genuine site identifier. The '!' delimiter may be used freely within the pre- C. H. Lindsey [Page 34] News Article Format April 2001 injection region, although '/' and '?' are also appropriate if used correctly. 5.6.4. Delimiter Summary A summary of the various delimiters. The name immediately to the left of the delimiter is always that of the machine which added the delimiter. '/' The name immediately to the right is known to be the identity of the machine from which the article was received (either because the entry was made by that machine and we have verified it, or because we have added it ourselves). '?' The name immediately to the right is the claimed identity of the machine from which the article was received, but we were unable to verify it (and have prepended our own view of where it came from, and then a '/'). '%' Everything to the right is the pre-injection region followed by the tail-entry. The name on the left is the FQDN of the injecting agent. The presence of two '%'s in a path indicates a double-injection (see 8.2.2). '!' The name immediately to the right is unverified. The presence of a '!' to the left of the '%' indicates that the identity to the left is that of an old-style system not conformant with this standard. ',' Reserved for future use, treat as '/'. Other Old software may possibly use other delimiters, which should be treated as '!'. But note in particular that ':', '-' and '_' are components of names, not delimiters, and FWS on its own MUST NOT be used as the sole delimiter. NOTE: Old Netnews relaying and injecting programs almost all delimit Path entries with the '!' delimiter, and these entries are not verified. As such, the presence of '%' as a delimiter will indicate that the article was injected by software conforming to this standard, and the presence of '!' as a delimiter to the left of a '%' will indicate that the message passed through systems developed prior to this standard. It is anticipated that relaying agents will reject articles in the old style once this new standard has been widely adopted. 5.6.5. Suggested Verification Methods The following approaches for common transports are suggested in order to meet a site's verification obligations. They are not required, but following them should avoid the necessity for wasteful double-entry Path additions. C. H. Lindsey [Page 35] News Article Format April 2001 If the incoming article arrives through some TCP/IP protocol such as NNTP, the IP address of the source will be known, and will likely already have been checked against a list of known FQDNs or IP addresses that the receiving site has agreed to peer with (this will have involved a DNS lookup of a known FQDN, following CNAME chains as required, to find an A record containing that source IP). 1. Where the path-identity is an FQDN (or even an arbitrary name starting with a '.') it is now a simple matter to check that it is the proper FQDN for the source, or some known registered alias thereof. Alternatively, where the FQDN in the path-identity has an associated A record, an immediate DNS lookup as above can be used to verify it. 2. Where the path-identity is an encoding of an IP address which does not immediately match the known IP address of the source, a reverse-DNS (in-addr.arpa PTR record) lookup may be done on the provided address, followed by a regular DNS "A" record lookup on the returned name. There may be A records for several IP addresses, of which one should match the path-identity and another should match the source. 3. If the path-identity fails to match any known alias for the source (requiring the insertion of an extra path-identity for the true source followed by a '?'), simply doing a reverse DNS (PTR) lookup on the source IP address is not sufficient to generate the true FQDN. The returned name must be mapped back to A records to assure it matches the source's IP address. If the incoming article arrives through some other protocol, such as UUCP, that protocol MUST include a means of verifying the source site. In UUCP implementations, commonly each incoming connection has a unique login name and password, and that login name (or some alias registered for it) would be expected as the path-identity. [The above description may still contain more detail that we would wish. My aim so far was to retain everything in Brad's original, but expressed in a more palatable manner. We can now decide how much of it we want to keep.] 5.6.6. Example Path: foo.isp.example/ .foo-server/bar.isp.example?10.123.12.2/old.site.example! barbaz/baz.isp.example%dialup123.baz.isp.example!x NOTE: That article was injected into the news stream by baz.isp.example (complaints may be addressed to usenet@baz.isp.example). The injector has taken care to record that it got it from dialup123.baz.isp.example. "x" is the default tail entry, though sometimes a real userid is put there. C. H. Lindsey [Page 36] News Article Format April 2001 The article was relayed, perhaps by UUCP, to the machine known in the UUCP maps database as "barbaz". Barbaz relayed it to old.site.example, which does not yet conform to this standard (hence the '!' delimiter). So one cannot be sure that it really came from barbaz. Old.site.example relayed it to a site claiming to have the IP address [10.123.12.2], and claiming (by using the '/' delimiter) to have verified that it came from old.site.example. [10.123.12.2] relayed it to ".foo-server" which, not being convinced that it truly came from [10.123.12.2], did a reverse lookup on the actual source and concluded it was known as bar.isp.example (that is not to say that [10.123.12.2] was not a correct IP address for bar.isp.example, but simply that that connection could not be substantiated by .foo-server). Observe that .foo-server has now added two entries to the Path. ".foo-server" is a locally significant name (observe the presence of the '.') within the complex site of many machines run by foo.isp.example, so the latter should have no problem recognizing .foo-server and using a '/' delimiter. Presumably foo.isp.example then delivered the article to its direct clients. It appears that foo.isp.example and old.site.example decided to fold the line, on the grounds that it seemed to be getting a little too long. 6. Optional Headers The headers appearing in this section have established meanings and MUST be interpreted according to the definitions given here. None of them is required to appear in every article but some of them are required in certain types of article, such as followups. Any header defined in this (or any other) standard MUST NOT appear more than once in an article unless specifically stated otherwise. Experimental headers (4.2.2.1) and headers defined by cooperating subnets are exempt from this requirement. See section 8 "Duties of Various Agents" for the full picture. 6.1. Reply-To The Reply-To header specifies a reply address(es) to be used for personal replies for the author(s) of the article when this is different from the author's address(es) given in the From header. The content syntax makes use of syntax defined in [MESSFOR], but subject to the revised definition of local-part given in section 5.2. Reply-To-content = From-content ; see 5.2 C. H. Lindsey [Page 37] News Article Format April 2001 In the absence of Reply-To, the reply address(es) is the address(es) in the From header. For this reason a Reply-To SHOULD NOT be included if it just duplicates the From header. NOTE: Use of a Reply-To header is preferable to including a similar request in the article body, because reply agents can take account of Reply-To automatically. An address of "<>" in the Reply-To header MAY be used to indicate that the poster does not wish to recieve email replies. 6.1.1. Examples Reply-To: John Smith Reply-To: John Smith , dave@isp.example Reply-To: John Smith ,andrew@isp.example, fred@site2.example Reply-To: Please do not reply <> 6.2. Sender The Sender header specifies the mailbox of the entity which actually sent this article, if that entity is different from that given in the From header or if more than one address appears in the From header. This header SHOULD NOT appear in an article unless the sender is different from the author. This header is appropriate for use by automatic article posters. The content syntax makes use of syntax defined in [MESSFOR]. Sender-content = mailbox 6.3. Organization The Organization header is a short phrase identifying the author's organization. Organization-content= 1*( [FWS] utext ) NOTE: Posting and injecting agents are discouraged from providing a default value for this header unless it is acceptable to all posters using those agents. Unless this header contains useful information (including some indication of the authors physical location) posters are discouraged from including it. 6.4. Keywords The Keywords field contains a comma separated list of important words and phrases intended to describe some aspect of the content of the article. The content syntax makes use of syntax defined in [MESSFOR]. Keywords-content = phrase *( "," phrase ) C. H. Lindsey [Page 38] News Article Format April 2001 NOTE: The list is comma separated NOT space separated. 6.5. Summary The Summary header is a short phrase summarizing the article's content. Summary-content = 1*( [FWS] utext ) The summary should be terse. Authors Ought to avoid trying to cram their entire article into the headers; even the simplest query usually benefits from a sentence or two of elaboration and context, and not all reading agents display all headers. On the other hand the summary should give more detail than the Subject. 6.6. Distribution The Distribution header is an inheritable header (see 4.2.2.2) which specifies geographical or organizational limits to an article's propagation. Distribution-content= distribution *( dist-delim distribution ) dist-delim = "," distribution = positive-distribution / negative-distribution positive-distribution = *FWS distribution-name *FWS negative-distribution = *FWS "!" distribution-name *FWS distribution-name = letter 1*distribution-rest distribution-rest = letter / "+" / "-" / "_" Articles MUST NOT be passed between relaying agents or to serving agents unless the sending agent has been configured to supply and the receiving agent to receive BOTH of (a) at least one of the newsgroups in the article's Newsgroups header, and (b) at least one of the positive-distributions (if any) in the article's Distribution header and none of the negative- distributions. Additionally, reading agents MAY be configured so that unwanted distributions do not get displayed. NOTE: Although it would seem redundant to filter out unwanted distributions at both ends of a relaying link (and it is clearly more efficient to do so at the sending end), many sending sites have been reluctant, historically speaking, to apply such filters (except to ensure that distributions local to their own site or cooperating subnet did not escape); moreover they tended to configure their filters on an "all but those listed" basis, so that new and hitherto unheard of distributions would not be caught. Indeed many "hub" sites actually wanted to receive all possible distributions so that they could feed on to their clients in all possible geographical (or organizational) C. H. Lindsey [Page 39] News Article Format April 2001 regions. Therefore, it is desirable to provide facilities for rejecting unwanted distributions at the receiving end. Indeed, it may be simpler to do so locally than to inform each sending site of what is required, especially in the case of specialized distributions (for example for control messages, such as cancels from certain issuers) which might need to be added at short notice. Tha possibility for reading agents to filter distributions has been provided for the same reason. Exceptionally, ALL relaying agents are deemed willing to supply or accept the distribution "world", and NO relaying agent should supply or accept the distribution "local". However, "world" SHOULD NEVER be mentioned explicitly since it is the default when the Distribution header is absent entirely. "All" MUST NOT be used as a distribution-name. Distribution-names SHOULD contain at least three characters, except when they are two-letter country names as in [ISO 3166]. Distribution-names are case-insensitive (i.e. "US", "Us" and "us" all specify the same distribution). NOTE: "Distribution: !us" can be used to cause an article to go to the whole of "world" except for "us". Posting agents Ought Not to provide a default Distribution header without giving the poster an opportunity to override it. Followup agents SHOULD initially supply the same Distribution header as found in the precursor. 6.7. Followup-To The Followup-To header specifies which newsgroup(s) followups should be posted to. Followup-To-content = Newsgroups-content / "poster" The syntax is the same as that of the Newsgroups-content, with the exception that the magic word "poster" is allowed. In the absence of a Followup-To header, the default newsgroup(s) for a followup are those in the Newsgroups header, and for this reason the Followup-To header SHOULD NOT be included if it just duplicates the Newsgroups header. A Followup-To header consisting of the magic word "poster" indicates that the author requests no followups to be sent in response to this article, only personal replies to the article's reply address. NOTE: An author who wishes both a personal reply and a followup post should include a Mail-Copies-To header (6.8). C. H. Lindsey [Page 40] News Article Format April 2001 6.8. Mail-Copies-To The Mail-Copies-To header indicates whether or not the poster wishes to have followups to an article emailed in addition to being posted to Netnews and, if so, establishes the address to which they should be sent. The content syntax makes use of syntax defined in [MESSFOR], but subject to the revised definition of local-part given in section 5.2. Mail-Copies-To-content = copy-addr / "nobody" / "poster" copy-addr = mailbox The keyword "nobody" indicates that the author does not wish copies of any followup postings to be emailed. This indication is widely seen as a very strong wish, and is to be taken as the default when this header is absent. The keyword "poster" indicates that the author wishes a copy of any followup postings to be emailed to him. Otherwise, this header contains a copy-addr to which the author wishes a copy of any followup postings to be sent. NOTE: Some existing practice uses the keyword "never" in place of "nobody" and "always" in place of "poster". These usages are deprecated, but followup agents MAY observe them. The automatic actions of a followup agent in the various cases (subject to manual override by the user) are as follows: nobody (or when the header is absent) The followup agent SHOULD NOT, by default, email such a copy and Ought, especially when there is an explicit "nobody", to issue a warning and ask for confirmation if the user attempts to do so. poster The followup agent Ought, by default, to email a copy, which MUST then be sent to the address in the Reply-To header, and in the absence of that to the address(es) in the From header. copy-addr The followup agent Ought, by default, to email a copy, which MUST then be sent to the copy-addr. NOTE: This header is only relevant when posting followups to Netnews articles, and is to be ignored when sending pure email replies to the author, which are handled as prescribed under the Reply-To header (6.1). Whether or not this header will also find similar usage for replies to messages sent to mailing lists falls outside the scope of this standard. C. H. Lindsey [Page 41] News Article Format April 2001 When emailing a copy, the followup agent SHOULD also include a "Posted-And-Mailed: yes" header (6.9). NOTE: In addition to the Posted-And-Mailed header, some followup agents also include within the body a mention that the article is both posted and mailed, for the benefit of reading agents that do not normally show that header. 6.9. Posted-And-Mailed Posted-And-Mailed-content = "yes" / "no" This header, when used with the "yes" keyword, indicates that the article has been both posted to the specified newgroups and emailed. It SHOULD be used when replying to the author of an article to which this one is a followup (see the Mail-Copies-To header in section 6.8) and it MAY be used when any article is also mailed to a recipient(s) identified in a To and/or Cc header that is also present. The "no" keyword is included for the sake of completeness; it MAY be used to indicate the opposite state, but is redundant insofar as it only describes the default state when this header is absent. This header, if present, MUST be included in both the posted and emailed versions of the article. The Newsgroups header of the posted article SHOULD be included in the email version as recommended in section 5.5. All other headers defined in this standard (excluding variant and local headers, but including specifically the Message-ID header) MUST be identical in both the posted and mailed versions of the article, and so MUST the body. NOTE: This leaves open the question of whether a To or a Cc header should appear in the posted version. Naturally, a Bcc header should not appear, except in a form which indicates that there are additional unspecified recipients. 6.10. References The References header lists optionally CFWS-separated message identifiers of precursors. The content syntax makes use of syntax defined in [MESSFOR]. References-content = msg-id *( CFWS msg-id ) NOTE: This differs from the syntax of [MESSFOR] by requiring at least one CFWS between the msg-ids (this was an [RFC 1036] requirement). A followup MUST have a References header, and an article that is not a followup MUST NOT have a References header. In a followup, if the precursor did not have a References header, the followup's References-content MUST be formed by the message identifier of the precursor. A followup to an article which had a References header MUST have a References header containing the precursor's References- content (subject to trimming as described below) plus the precursor's C. H. Lindsey [Page 42] News Article Format April 2001 message identifier appended to the end of the list (separated from it by CFWS). Followup agents SHOULD NOT trim message identifiers out of a References header unless the number of message identifiers exceeds 21, at which time trimming SHOULD be done by removing sufficient identifiers starting with the second so as to bring the total down to 21. However, it would be wrong to assume that References headers containing more than 21 message identifiers will not occur. 6.10.1. Examples References: References: References: <222@site1.example> <87tfbyv@site7.example> <67jimf@site666.example> References: 6.11. Expires The Expires header specifies a date and time when the article is deemed to be no longer relevant and could usefully be removed ("expired"). The content syntax makes use of syntax defined in [MESSFOR]. Expires-content = date-time An Expires header should only be used in an article if the requested expiry time is earlier or later than the time typically to be expected for such articles. Local policy for each serving agent will dictate whether and when this header is obeyed and authors SHOULD NOT depend on it being completely followed. 6.12. Archive This optional header is a signal to automatic archival agents on whether this article is available for long-term storage. Archive-content = [CFWS] ("no" | "yes" ) [CFWS] Archive-header-parameter = Filename-token "=" value ; for USENET-header-parameters see 4.1 Filename-token = [CFWS] "filename" [CFWS] Agents which see "Archive: no" MUST NOT keep the article past the date when it would otherwise have expired. "Archive: yes" merely confirms what is already the default state. The optional Filename parameter MAY then be used to suggest a filename under which the article should be archived. Further extensions to this standard may provide additional parameters for administration of the archiving process. C. H. Lindsey [Page 43] News Article Format April 2001 6.13. Control The Control header marks the article as a control message, and specifies the desired actions (other than the usual ones of storing and/or relaying the article). Control-content = CONTROL-verb CONTROL-argument CONTROL-verb = verb = token CONTROL-arguments = arguments = *( CFWS value ) ; see 4.1 [Observe that reqires the use of a quoted-string if any tspecials or NON-ASCII characters are involved. This is a restriction on present usage, but follows Mime practice.] The verb indicates what action should be taken, and the argument(s) (if any) supply details. In some cases, the body of the article may also contain details. Section 7 describes all of the standard verbs. An article with a Control header MUST NOT also have a Replaces or Supersedes header. NOTE: The presence of a Subject header starting with the string "cmsg " and followed by a Control-content MUST NOT be construed, in the absence of a proper Control header, as a request to perform that control action (as may have occurred in some legacy software). See also section 5.4. 6.14. Approved The Approved header indicates the mailing addresses (and possibly the full names) of the persons or entities approving the article for posting. Approved-content = From-content ; see 5.2 Each mailbox contained in the Approved-content MUST be that of the person or entity in question, and one of those mailboxes MUST be that of the actual injector of the article. [This is the start of an attempt to strengthen this header. It should be a TOSSable offence to put a dummy or invalid address in here. Later, when we have some form of authentication, I would hope to be able to say more.] An Approved header is required in all postings to moderated newsgroups. If this header is not present in such postings, then relaying and serving agents MUST reject the article. Please see section 8.2.2 for how injecting agents should treat postings to moderated groups that do not contain this header. C. H. Lindsey [Page 44] News Article Format April 2001 An Approved header is also required in certain control messages, to reduce the risk of accidental posting of same; see the relevant parts of section 7. 6.15. Replaces / Supersedes These two headers contain one or more message identifiers that the current article is expected to replace or supersede. All listed articles MUST be treated as though a "cancel" control message had arrived for the article (but observe that a site MAY choose not to honour a "cancel" message, especially if its authenticity is in doubt). 6.15.1. Syntax and Semantics The Replaces and Supersedes headers specify articles to be cancelled on arrival of this one. The content syntax makes use of syntax defined in [MESSFOR]. Replaces-content = msg-id *( CFWS msg-id ) Replaces-header-parameter = Disposition-token "=" [CFWS] ( Disposition-value / DQUOTE Disposition-value DQUOTE ) [CFWS] ; for USENET-header-parameters see 4.1 Disposition-token = [CFWS] "disposition" [CFWS] Disposition-value = "replace" / "revise" / "repost" Supersedes-content = msg-id NOTE: There is no "c" in "Supersedes". If an article contains a Replaces header, then the old articles mentioned SHOULD simply be deleted by the serving agent, as in a cancel message (7.5), and the new article inserted into the system as any other new article would be. A Replaces-header-parameter is only meaningful when it occurs within a Replaces-content. If its Disposition-value is "revise" or "repost" (or if the Replaces-header-parameter is absent, then by default) reading agents Ought Not to show the article as an "unread" article unless the replaced article(s) were themselves all unread, except when the reader has configured his reading agent otherwise. Moreover, if a Disposition-value is "revise" or "repost", serving agents that generate a local Xref header MUST then include additional "revise" or "repost" information as set out in section 6.16. NOTE: A replacement with "disposition=replace" is intended to be used in the case of an article that is sufficiently different from its predecessors that it is advisable for readers to see it again. A replacement with "disposition=revise" is intended to be used in the case of a minor change, unworthy of being brought C. H. Lindsey [Page 45] News Article Format April 2001 to the attention of a reader who has already read one of its predecessors. A replacement with "disposition=repost" is intended to be used in the case of an article identical to the one replaced (but possibly being reposted because the earlier one had likely expired). NOTE: A reader who elects to ignore all the articles available in a newsgroup (perhaps on the occasion of accessing that newsgroup for the first time) will likely have them all marked as "already read", unless the reading agent provides a distinct mark such as "never offered". This could lead to a later replacement with "revise" or "repost" for one of those articles being missed. The Supersedes header is obsolescent, is provided only for compatibility with existing software, and may be removed entirely in some future version of this standard. Its meaning is the same as that of a corresponding Replaces header with its Replaces-header-parameter set to "disposition=replace", and whenever a Supersedes header is provided a matching Replaces header SHOULD be provided as well. Observe that the Supersedes header makes provision for only a single msg-id. Until the Replaces header has become widely implemented, software SHOULD generate Replaces headers with only one msg-id, and cancel control messages SHOULD be issued if needed for further identifiers. Moreover, until that time, any article containing a Replaces header SHOULD contain also a Supersedes header (or alternatively be accompanied by a Control cancel message) for that same msg-id, to ensure that older systems still at least remove the predecessor. When a message contains both a Replaces and a Supersedes header they MUST be for the same msg-id. Furthermore, to resolve any doubt, the Replaces header shall be deemed to take priority. Whatever security or authentication mechanisms are required for a Control cancel message MUST also be required for an article with a Replaces or Supersedes header. In the absence or failure of such checks, the article SHOULD be discarded, or at most stored as an ordinary article. [We can write something more constructive in here as soon as the situation with regard to cancel-locks and signed headers has been clarified.] 6.15.2. Message-ID version procedure Whilst this procedure is not essential for the operation of Netnews, it SHOULD be supported by all serving agents. However, for the procedure to work, all the msg-ids in the Replaces-content MUST be those of successive replacements of the same original article, and all be generated as described below. [Whilst the procedure about to be described will undoubtedly work, it must be pointed out that life would be much simpler if there was only a single msg-id allowed in a Replaces-content.] C. H. Lindsey [Page 46] News Article Format April 2001 6.15.2.1. Message version numbers According to [MESSFOR], and omitting the obsolete forms, the syntax of the left hand side of a msg-id (the part before the "@") is given by: id-left-side = dot-atom-text / no-fold-quote Consider this to be replaced by: id-left-side = ( atom-text / no-fold-quote ) *( dollars-sequence ) dollars-sequence = version-number / random-dollars-sequence version-number = "$" %d118 "=" 1*DIGIT ; $v=digits random-dollars-sequence = "$" 1*atom-text Whilst this is admittedly ambiguous ("$" is already a possible value of atom-text) and does not in fact change what is allowable as an id-left-side, it does serve to allow dollars-sequences such as version-number (and any others that may be added by extensions to this standard) to be distinguished within a message identifier and utilized by agents which can understand them. Observe that no-fold- quotes cannot occur within a dollars-sequence. Posters and/or posting agents when replacing (or superseding) articles SHOULD arrange that the message identifier of the replacement follows the following convention, generating what are known as "version-number" message identifiers. This is to enable the new version of the article to be retrieved by its original message identifier, notably when it occurs in a URL of the form [RFC 1738]. 1. If the id-left-side of the most recent predecessor's message identifier contains a leftmost version-number "$v=", where is an integer version number, possibly followed by one or more random-dollars-sequences, the replacement message identifier should be obtained by replacing the with the integer and providing a different random-dollars-sequence(s). For example becomes . 2. If the id-left-side of the predecessor's message identifier does not contain a version-number, the replacement message identifier should be obtained by appending the string "$v=1", preferably followed by a random-dollars-sequence(s), to that id-left-side. For example becomes . Any random-dollars-sequence so added MUST NOT start with "$=" for any letter . C. H. Lindsey [Page 47] News Article Format April 2001 NOTE: The presence of a random-dollars-sequence following the version-number is intended to prevent a malicious poster from preempting the posting of a replacement article by guessing its likely message identifier. Attempts to fetch a replaced (or superseded) article by its message identifier SHOULD retrieve instead its most recent successor which has used the version-number convention. This is intended to ensure that "news:" URLs [RFC-1738] will continue to work even when an article has been replaced, but agents Ought then to draw the user's attention to the fact that the message identifier retrieved differed from that requested. 6.15.2.2. Implementation and Use Note [Here is the implementation technique that we discussed, based on the use of a conventional History file. This is a sanity check for our own use, not intended to go in the final text. There are two cases to consider: A. Traditional implementations (e.g. CNEWS) where each History file line includes a full message-identifier plus an item for each group in which the article appears. Thus History file entries are of variable length, and it is impractical to update them in situ. B. History files made up of fixed length records (e.g. as proposed for INN), which enables entries to be overwritten in situ. The History line typically contains a hash of the message identifier plus some pointer to an object representing the article as stored. We consider the traditional case first: 1A. Ensure that the implementation of DBZ is not upset if the same key is attempted to be stored a second time, and that such a key always retrieves the latest record indexed by that key. 2A. Additions to the History file are always made at the end. Removals or changes to existing entries are only made by the expire program. An entry for a Replaced (or otherwise cancelled) article will remain until, first, the expire program removes the links to the articles that are no longer stored, and later on removes the entire entry according to its expiry date. For every entry containing a '$v=n' followed by random- dollars-sequences there will be an immediately following entry identical but for the omission of that '$v=n' and of the random-dollars-sequences. Thus there may be several entries with identical message-ids but, because of the change to DBZ just described, only the most recent will ever be seen except by programs that access the History file directly, rather than by its index. 3A. When an article is Replaced, at the same time as the successor article is entered into the History file, with '$v=7' say, a duplicate entry (same article list) is entered under the same key, modified by removing any leftmost '$v=n' and the following random-dollars-sequences from it. For the fixed length implementations, these steps become: C. H. Lindsey [Page 48] News Article Format April 2001 1B. DBZ does not need to be changed. 2B. History file entries may be updated in situ. An entry for a Replaced (or otherwise cancelled) article can be overwritten with that for the new article (or with a suitable indication of cancellation). For every entry containing a '$v=n' followed by random-dollars-sequences there will always exist a second entry identical but for the omission of that '$v=n' and of the random-dollars-sequences, both entries pointing to the same article object. 3B. When an article is Replaced, at the same time as the successor article is entered into the History file, with '$v=7' say, the existing entry without the leftmost '$v=n' and the following random-dollars- sequences is overwritten (with the new article and new expiry date, after destroying the old article, of course). If no such entry exists, one is created. >From here on, the two cases are the same: 4. Provide a call to a routine which, if asked to retrieve any message identifier with '$v=n' and finding it missing (or rather linked to no stored groups), immediately tries again without the '$v=n' and its random-dollars-sequences. NOTE. We don't want this behaviour when checking whether we already have an article offered to us by IHAVE, only in response to an ARTICLE command. So this needs to be an extra call in DBZ, in addition to the 'fetch' or 'dbzfetch' calls, to be used in the proposed extension to the NNTP ARTICLE command. Observe that if the requested '$v=n' is present and linked to stored articles (for whatever reason) then you will be given exactly that version, even if later ones are stored as well. 5. NOTE that I have dropped the idea of having '$v=0', because you can never be sure that the very first issue of the FAQ used it, so you have to provide the versionless root as well. If someone asks for '$v=0' (or any '$v=n') the algorithm I gave will still find it via the root. So we don't care what people put in URLs. 6. You are supposed to cancel the replaced/superseded article. If you REALLY want to keep the old ones around a little longer, then this implementation will not work if you want the latest to be retrieved automatically - you will have to invent something much more complicated. 7. Having said all that, here follows a brief account of the same thing, but short enough to be included in our document (the convention being that implementation issues are hinted at, rather than being described in full detail).] Typically, a news database will index a Replacement article both by its "version-number" message identifier (containing a "$v=" tag followed by a random-dollars-sequence) and by its "root" version (without the "$v=" tag or any following random-dollars-sequence). Thus when a request for an article comes in that is not present under the version-number requested, any article that is present and indexed by the corresponding root version can be retrieved instead. The indexing mechanism needs to be such that, although the root version may have at times referred to many different articles, it is always C. H. Lindsey [Page 49] News Article Format April 2001 the current one that is retrieved. NOTE: The presence of a version-number in the message identifier of an article without a Replaces or Supersedes header causes no extra action (it is just an ordinary article). Observe also that if an article with the exact message identifier (even though it contains a version-number) is, for whatever reason, already present on the serving agent, that article will always be retrieved in preference to the one indexed by any root version. 6.15.2.3. The Message-Version NNTP extension The following Service Extension to the NNTP protocol is defined in accordance with the framework set out in [NNTP], and is to be registered with IANA. Name of the extension: Message-Version Extension Label (for the LIST EXTENSIONS command): MESSAGE-VERSION Additional keywords, syntax and parameters: None In a server supporting this extension, the behaviour of the ARTICLE, HEAD, BODY and STAT commands when the parameter is a is modified as follows. If the specified article is available on the server then it (or its Head, Body or Status as appropriate) is returned in the normal manner. Otherwise, if a leftmost id-left-side of the (the part before the '@') contains "$v=", where is an integer version number, that "$v="and everything following it is stripped from that id-left-side and the article (Head, Body or Status) with the stripped is returned instead. Otherwise (no article is available under the original, or any stripped, ), a 430 response is given as usual. NOTE: If the client is concerned to know whether the article found was exactly the one requested or a replacement article corresponding to a stripped , then it has only to compare the requested with that returned in the 220 (221, 222, or 223) response. The intent of this extension is to enable the retrieval of the current version of an article (such as a regularly posted FAQ) referenced by a "news:" URL [RFC- 1738] which quotes the of an earlier version. NOTE: This extension has no effect on the IHAVE command. 6.15.2.4. Examples Example 1. The first edition of a FAQ is posted with a message identifier of the form: . The next (but identical) version, a month later, has: Message-ID: Replaces: ; disposition=repost Supersedes: C. H. Lindsey [Page 50] News Article Format April 2001 Observe the inclusion of a Supersedes header as well, it being presumed that the Replaces header was not yet widely implemented at that time. The next one, another month later (and with some significant changes justifying the use of "replace" rather than "repost") has: Message-ID: Replaces:

; disposition=replace Supersedes: The next one, another month later, has: Message-ID: Replaces:

; disposition=repost Supersedes: Note that the only reason to include more than one message identifier in the Replaces is in case a site had missed the previous Replacement. It is hardly necessary with such a long interval between the postings. Under the above, on systems using the version-number system (which is optional) requests for any message identifier in the chain will always return the most recent. As such the URL "news:examplegroup- faq@faq-site.example" will always work, making it suitable to appear in HTML documents. Example 2. A user posts a message to the net. She notices a typo and, 2 minutes later, posts with: Message-ID: Replaces: ; disposition=revise 3 minutes later she sees another typo, and posts: Message-ID: Replaces: ; disposition=revise The two bad versions will be replaced with the 3rd, even if a site never sees the 2nd due to batching or feed problems (thus the use of two message identifiers is quite useful in this case, in contradistinction to the first example). Requests for the original will return the 3rd. 6.16. Xref The Xref header is a local header (4.2.2.3) which indicates where an article was filed by the last server to process it, and whether it is a Replacement (6.15) for an earlier article. C. H. Lindsey [Page 51] News Article Format April 2001 Xref-content = [CFWS] server-name 1*( CFWS location ) server-name = path-identity ; see 5.6.1 location = newsgroup-name ":" article-locator [ CFWS ( "revise" / "repost" ) ":" article-locator ] article-locator = 1*( %x21-7E ) ; US-ASCII printable characters The server-name is included so that software can determine which serving agent generated the header. The locations specify what newsgroups the article was filed under (which may differ from those in the Newsgroups header) and where it was filed under them. The exact form of an article-locator is implementation-specific. NOTE: The traditional form of an article-locator is a decimal number, with articles in each newsgroup numbered consecutively starting from 1. NNTP demands that such a model be provided, and much other software expects it, but it seems desirable to permit flexibility for unorthodox implementations. Whenever an Xref header is created by an agent for an article which includes a Replaces header with "disposition=revise" or "disposition=repost" (6.15), it SHOULD include, within the location field of each newsgroup in the Newsgroups header of whichever of the old articles referenced in that Replaces header is still current, a corresponding "revise:" or "repost:" for the oldest article known to be being replaced, where is the article-locator under which that oldest article was filed. If the Replaces header has a "disposition=replace" (explicit or implicit) the Xref header MUST NOT include any such reference to an . NOTE: This is to enable reading agents to avoid showing that article to users who have already read any of those older articles (see 6.15). Because several replacements for a given article may arrive in the period between attempts by a reader to read a given newsgroup, it is useful to include the oldest one in the Xref header. The information necessary to determine this article can be obtained from the Xref header of the current version of the article just before it is deleted. Observe that a server that never received one of the replaced articles can still generate suitable information from whichever earlier version it actually has. This is why it is useful for a Replaces header to mention more than one earlier article, especially when replacements are being issued in quick succession. NOTE: "revise" and "repost" are case-insensitive. An agent inserting an Xref header into an article MUST delete any previous Xref header(s). A relaying agent MAY delete it before relaying, but otherwise it SHOULD be ignored (and usually replaced) by any relying or serving agent receiving it. C. H. Lindsey [Page 52] News Article Format April 2001 An agent MUST use the same serving-name in Xref headers as the path- identity it uses in Path headers. 6.17. Lines The Lines header indicates the number of lines in the body of the article. Lines-content = [CFWS] 1*digit The line count includes all body lines, including the signature if any, including empty lines (if any) at the beginning or end of the body, and including the whole of all Mime message and multipart parts contained in the body (the single empty separator line between the headers and the body is not part of the body). The "body" here is the body as found in the posted article as transmitted by the posting agent. This header is to be regarded as obsolete, and it will likely be removed entirely in a future version of this standard. In the meantime, its use is deprecated. 6.18. User-Agent The User-Agent header contains information about the user agent (typically a newsreader) generating the article, for statistical purposes and tracing of standards violations to specific software needing correction. Although not one of the mandatory headers, posting agents SHOULD normally include it. User-Agent-content = product-token *( CFWS product-token ) product-token = value ["/" product-version] ; see 4.1 product-version = value This header MAY contain multiple product-tokens identifying the agent and any subproducts which form a significant part of the posting agent, listed in order of their significance for identifying the application. Product-tokens should be short and to the point - they MUST NOT be used for information beyond the canonical name of the product and its version. Injecting agents MAY include product information for servers (such as "INN/1.7.2"), but serving and relaying agents MUST NOT generate or modify this header to list themselves. NOTE: Variations from [RFC 2616] which describes a similar facility for the HTTP protocol: 1. use of arbitrary text or octets from character sets other than US-ASCII in a product-token may require the use of a quoted-string, 2. "{" and "}" are allowed in a value (product-token and product-version) in Netnews, C. H. Lindsey [Page 53] News Article Format April 2001 3. UTF-8 replaces ISO-8859-1 as charset assumption. NOTE: Comments should be restricted to information regarding the product named to their left such as platform information and should be concise. Use as an advertising medium (in the mundane sense) is discouraged. 6.18.1. Examples User-Agent: tin/1.2-PL2 User-Agent: tin/1.3-950621beta-PL0 (Unix) User-Agent: tin/unoff-1.3-BETA-970813 (UNIX) (Linux/2.0.30 (i486)) User-Agent: tin/pre-1.4-971106 (UNIX) (Linux/2.0.30 (i486)) User-Agent: Mozilla/4.02b7 (X11; I; en; HP-UX B.10.20 9000/712) User-Agent: Microsoft-Internet-News/4.70.1161 User-Agent: Gnus/5.4.64 XEmacs/20.3beta17 ("Bucharest") User-Agent: Pluto/1.05h (RISC-OS/3.1) NewsHound/1.30 User-Agent: inn/1.7.2 User-Agent: telnet NOTE: This header supersedes the role performed redundantly by experimental headers such as X-Newsreader, X-Mailer, X-Posting- Agent, X-Http-User-Agent, and other headers previously used on Usenet for this purpose. Use of these experimental headers SHOULD be discontinued in favor of the single, standard User- Agent header which can be used freely both in Netnews and mail. 6.19. Injector-Info The Injector-Info header SHOULD be added to each article by the injecting agent in order to provide information as to how that article entered the Netnews system and to assist in tracing its true origin. Injector-Info-content = path-identity Injector-Info-header-parameter = posting-host-parameter / posting-account-parameter / posting-sender-parameter / posting-logging-parameter / posting-date-parameter ; for USENET-header-parameters see 4.1 posting-host-parameter = [CFWS] "posting-host" [CFWS] "=" [CFWS] ( host-value / DQUOTE host-value DQUOTE ) [CFWS] host-value = dot-atom / [ dot-atom ":" ] ( dotted-quad / ; see [RFC 820] ipv6-numeric ) ; see [RFC 2373] posting-account-parameter = [CFWS] "posting-account" [CFWS] "=" value posting-sender-parameter C. H. Lindsey [Page 54] News Article Format April 2001 = [CFWS] "sender" [CFWS] "=" [CFWS] ( sender-value / DQUOTE sender-value DQUOTE ) [CFWS] sender-value = ( mailbox / "verified" ) posting-logging-parameter = [CFWS] "logging-data" [CFWS] "=" value posting-date-parameter = [CFWS] "posting-date" [CFWS] "=" [CFWS] ( date-value / DQUOTE date-value DQUOTE ) [CFWS] date-value = 1*DIGIT [ ":" date-time ] An Injector-Info header MUST NOT be added to an article by any agent other than an injecting agent. Any Injector-Info header present when an article arrives at an injecting agent MUST be removed. In particular if, for some exceptional reason (8.2.2), an article gets injected twice, the Injector-Info header will always relate to the second injection. The path-identity MUST be the same as the path-identity prepended to the Path header by that same injecting agent which, following section 5.6.2, MUST therefore be a fully qualified domain name (FQDN) mailable address. Although comments and folding of white space are permitted throughout the Injector-Info-content specification, it is RECOMMENDED that folding is not used within any header-parameter (but only before or after the ";" separating parameters), and that comments are only used following the last parameter. It is also RECOMMENDED that such parameters as are present are included in the order in which they have been defined in the syntax above. An injecting agent SHOULD use a consistent form of this header for all articles emanating from the same or similar origins. NOTE: The effect of those recommendations is to facilitate the recognition of articles arising from certain designated origins (as in the so-called "killfiles" which are available in some reading agents). Observe that the order within the syntax has been chosen to place last those parameters which are most likely to change between successive articles posted from the same origin. NOTE: To comply with the overall "attribute = value" syntax of USENET-header-parameters, any value containing an ipv6-numeric, a date-time, a mailbox or any CFWS MUST be quoted using s (the quoting is optional in other cases). NOTE: This header is intended to replace various currently-used but nowhere-documented headers such as "NNTP-Posting-Host", "NNTP-Posting-Date" amd "X-Trace". Any of these headers present when an article arrives at an injecting agent SHOULD also be removed as above. C. H. Lindsey [Page 55] News Article Format April 2001 6.19.1. Usage of Injector-Info-header-parameters The purpose of these parameters is to enable the injecting agent to make assertions about the origin of the article, in fulfilment of its responsibilities towards the rest of the network as set out in section 8.2. These assertions can then be utilized as follows: 1. To enable the administrator of the injecting agent to respond to complaints and queries concerning the article. For this purpose, the parameters included SHOULD be sufficient to enable the administrator to identify its true origin (which parameters are best suited to this purpose will vary with the nature of the injecting site and of its relationship to the posters who use it - there is no benefit in including parameters which contribute nothing to this aim). An administrator MAY, with those parameters where the syntax so allows, use cryptic notations interpretable only by himself if he considers it appropriate to protect the privacy of that origin. 2. To enable relaying, serving and reading agents to recognize articles from origins which they might wish to reject, divert, or otherwise handle specially, for reasons of site policy. 3. To enable the timely identification of spews af articles arising from a common origin. An injecting agent MUST NOT include any Injector-Info-header- parameter unless it has positive evidence of its correctness. An injecting agent MAY include other-header-parameters with x-token attributes which will assist in identifying the origin of the article. NOTE: It will be observed that the range of parameters provided allows much choice as to the precise manner in which an injecting agent fulfils its responsibilities. Whilst this standard does not seek to establish any preferences in this matter, administrators of injecting agents need to be aware of the privacy implications of the choices that they make. 6.19.1.1. The posting-host-parameter If a dot-atom is present, it MUST be a FQDN identifying the specific host from which the injecting agent received the article. Alternatively, an IP address (dotted-quad or ipv6-numeric) identifies that host. If both forms are present, then they MUST identify the same host, or at least have done so at the time the article was injected. NOTE: It is commonly the case that this header identifies a dial-up point-of-presence, in which case a posting-account or logging-data may need to be consulted to find the true origin of the article. C. H. Lindsey [Page 56] News Article Format April 2001 6.19.1.2. The posting-account-parameter This parameter identifies the source from which the injecting agent received the article. It MAY be in a cryptic notation understandable only by the administrator of the injecting agent, but it MUST be such that a given source always gives rise to the same posting-account (if the injecting agent is unable to meet that obligation, then it should use a posting-logging-parameter instead). 6.19.1.3. The posting-sender-parameter This parameter identifies the mailbox of the verified sender of the article (alternatively, it uses the token "verified" to indicate that at least any addr-spec in the Sender header of the article, or in the From header if the Sender header is absent, is correct). NOTE: An injecting agent is unlikely to be able to make use of this parameter except in cases where it is running on a machine which is aware of the user-space in which the posting agent is operating. This parameter should be used in preference to a posting-account-parameter in such situations. 6.19.1.4. The posting-logging-parameter This parameter contains information (typically a serial number or a session number) which will enable the true origin of the article to be determined by reference to logging information kept by the injecting agent. 6.19.1.5. The posting-date-parameter This parameter identifies the time at which the article was injected (as distinct from the Date header, which indicates when it was written). It is in the form of the number of seconds elapsed since January 1st 1970, optionally followed by a date-time which MUST indicate the same time. 6.19.2. Example Injector-Info: news2.isp.net; posting-host=modem-15.pop.isp.net; posting-account=client0002623; logging-data=2427; posting-date="965243133: Wed 2 Aug 2000 20:05:33 -0100 (BST)" 6.20. Complaints-To The Complaints-To header is added to an article by an injecting agent in order to indicate the mailbox to which complaints concerning the poster of the article may be sent. Complaints-To-content = mailbox C. H. Lindsey [Page 57] News Article Format April 2001 A Complaints-To header MUST NOT be added to an article by any agent other than an injecting agent. Any Complaints-To header present when an article arrives at an injecting agent MUST be removed. In particular if, for some exceptional reason (8.2.2), an article gets injected twice, the Complaints-To header will always relate to the second injection. The specified mailbox is for sending complaints concerning the behaviour of the poster of the article; it SHOULD NOT be used for matters concerning propagation, protocol problems, etc. In the absence of this header, such complaints should be sent to "usenet@" or "news@" the path-identity which was prepended to the Path header by the injecting agent following section 5.6.2. 6.21. MIME headers 6.21.1. Syntax The following headers, as defined within [RFC 2045] and its extensions, may be used within articles conforming to this standard. MIME-Version: Content-Type: Content-Transfer-Encoding: Content-ID: Content-Description: Content-Disposition: Content-MD5: Insofar as the syntax for these headers, as given in [RFC 2045], does not specify precisely where whitespace and comments may occur (whether in the form of WSP, FWS or CFWS), the usage defined in this standard, and failing that in [MESSFOR], and failing that in [RFC 822] MUST be followed. In particular, there MUST NOT be any WSP between a header-name and the following colon and there MUST be a SP following that colon. The meaning of the various MIME headers is as defined in [RFC 2045] and [RFC 2046], and in extensions registered in accordance with [RFC 2048]. However, their usage is curtailed as described in the following sections. 6.21.2. Content-Transfer-Encoding Posting agents SHOULD specify "Content-Transfer-Encoding: 8bit" for all articles not written in pure US-ASCII and not requiring full binary. They MAY use "8bit" encoding even when "7bit" encoding would have sufficed. They SHOULD specify "base64" when the content type implies binary (i.e. content intended for machine, rather than human, consumption). NOTE: If a future extension to the MIME standards were to provide a more compact encoding of binary suited to transport over an 8bit channel, it could be considered as an alternative C. H. Lindsey [Page 58] News Article Format April 2001 to base64 once it had gained widespread acceptance. Posting agents SHOULD NOT specify encoding "quoted-printable", but reading agents MUST interpret that encoding correctly. Encoding "binary" MUST NOT be used (except in cooperating subnets with alternative transport arrangements) because this standard does not mandate a transport mechanism that could support it. Injecting and relaying agents MUST NOT change the encoding of articles passed to them. Gateways SHOULD NOT change the encoding unless absolutely necessary. 6.21.3. Content-Type The Content-Type: "text/plain" is the default type for any news article, but the recommendations and limits on line lengths set out in section 4.5 Ought to be observed The acceptability of other subtypes of Content-Type: "text" (such as "text/html") is a matter of policy (see 1.1), and posters Ought Not to use them unless established policy or custom in the particular hierarchies or groups involved so allows. Moreover, even in those cases, for the benefit of readers who see it only in its transmitted form, the material SHOULD be "pretty-printed" (for example by restricting its line length as above and by keeping sequences which control its layout or style separate from the meaningful text). In the same way, Content-Types requiring special processing for their display, such as "application", "image", "audio", "video" and "multipart/related" are discouraged except in groups specifically intended (by policy or custom) to include them. Exceptionally, those application types defined in [RFC 1847] and [RFC 2015] for use within "multipart/signed" articles, and the type "application/pgp-keys" (or other similar types containing digital certificates) may be used freely but, contrary to [RFC 2015] and unless the article is intended to be sent by mail also, the Content-Transfer-Encoding SHOULD be left as "8bit" (or "7bit" as appropriate). Reading agents SHOULD NOT, unless explicitly configured otherwise, act automatically on Application types which could change the state of that agent (e.g. by writing or modifying files), except in the case of those prescribed for use in control messages (7.1.2 and ). 6.21.3.1. Message/partial The Content-Type "message/partial" MAY be used to split a long news article into several smaller ones, but this usage is discouraged on the grounds that modern transport agents should have no difficulty in handling articles of arbitrary length. However, IF this feature is used, then the "id" parameter SHOULD be in the form of a unique message identifier (but different from that in the Message-ID header of any of the parts). Contrary to the C. H. Lindsey [Page 59] News Article Format April 2001 requirements specified in [RFC 2046], the Transfer-Encoding SHOULD be set to "8bit" at least in each part that requires it. The second and subsequent parts SHOULD contain References headers referring to all the previous parts, thus enabling reading agents with threading capabilities to present them in the correct order. Reading agents MAY then provide a facility to recombine the parts into a single article (but this standard does not require them to do so). 6.21.3.2. Message/rfc822 The Content-Type "message/rfc822" should be used for the encapsulation (whether as part of another news article or, more usually, as part of a mail message) of complete news articles which have already been posted to Netnews and which are for the information of the recipient, and do not constitute a request to repost them. In the case where the encapsulated article has Content-Transfer- Encoding "8bit", it will be necessary to change that encoding if it is to be forwarded over some mail transport that only supports "7bit". However, this should not be necessary for any mail transport that supports the 8BITMIME feature [SMTP]. Moreover, where the headers of the encapsulated article contain any UTF8-xtra-chars (2.4), it may not be possible to transport them over mail transports even where 8BITMIME is supported. In such cases, it will be necessary to encode those headers as provided in [RFC 2047] (notwithstanding that such usage is deprecated for news headers by this standard, and actually forbidden in the case of the Newsgroups header). In the event that the encapsulated article has to be encoded for either of these reasons, it may be necessary to reverse that encoding if certain forms of digital signatures have been employed, or if the article is to be reintroduced into some Netnews system (however, in the latter case, the Content-Type "application/news-transmission" should have been used instead). NOTE: It is likely, though not guaranteed, that headers containing UTF8-xtra-chars will pass safely through mail transports supporting 8BITMIME if the "message/rfc822" object is sent as an attachment (i.e. as a part of a multipart) rather than as the top-level body of the mail message. Moreover, it is anticipated that future extensions to the mail standards will permit headers containing UTF8-xtra-chars to be carried without further ado over conforming transports. [In fact, of current transports supporting 8BITMIME, only sendmail will have problems with UTF-8 in top-level headers.] 6.21.3.3. Message/external-body The Content-Type "message/external-body" could be apropriate for texts which it would be uneconomic (in view of the likely readership) to distribute to the entire network. C. H. Lindsey [Page 60] News Article Format April 2001 6.21.3.4. Multipart types The Content-Types "multipart/mixed", "multipart/parallel" and "multipart/signed" may be used freely in news articles. However, except where policy or custom so allows, the Content-Type: "multipart/alternative" SHOULD NOT be used, on account of the extra bandwidth consumed and the difficulty of quoting in followups, but reading agents MUST accept it. The Content-Type: "multipart/digest" is commended for any article composed of multiple messages more conveniently viewed as separate entities, thus enabling reading agents to move rapidly between them. The "boundary" should be composed of 28 hyphens (US-ASCII 45) (which makes each boundary delimiter 30 hyphens, or 32 for the final one) so as to enable reading agents which currently support the digest usage described in [RFC 1153] to continue to operate correctly. [Actually, this conflicts with some present digest usage (such as the news.answers rules), but should still be the right way to go. There remains the possibility that future Mime-compliant readers could enable one to proceed directly to some particular message by clicking on it in a table of contents, but that feature is not yet supported by the curremt Mime standards.] NOTE: The various recomendations given above regarding the usage of particular Content-Types apply also to the individual parts of these multiparts. 6.21.4. Character Sets In principle, any character set may be specified in the "charset=" parameter of a content type. However, only those character sets (and the corresponding parts of UTF-8) should be used which are appropriate for the customary language(s) of the hierarchy or newsgroup concerned (whose readers could be expected to possess agents capable of displaying them). 6.21.5. Content Disposition Reading agents Ought to honour any Content-Disposition header that is provided (in particular, they Ought to display any part of a multipart for which the disposition is "inline", possibly distinguished from adjacent parts by some suitable separator). In the absence of such a header, the body of an article or any part of a multipart with Content-Type "text" Ought to be displayed inline. Followup agents which quote parts of a precursor (see 4.3.2) Ought initially to include all parts of the precursor that were displayed inline, as if they were a single part. 6.21.6. Definition of some new Content-Types This standard defines (or redefines) several new Content-Types, which require to be registered with IANA as provided for in [RFC 2048]. For "application/news-groupinfo" see 7.1.2, for "application/news- checkgroups" see 7.4.1, and for "application/news-transmission" see C. H. Lindsey [Page 61] News Article Format April 2001 the following section. 6.21.6.1. Application/news-transmission The Content-Type "application/news-transmission" is intended for the encapsulation of complete news articles where the intention is that the recipient should then inject them into Netnews. This Application type SHOULD be used when mailing articles to moderators and to mail- to-news gateways (see 8.2.2). NOTE: The benefit of such encapsulation is that it removes possible conflict between news and email headers and it provides a convenient way of "tunnelling" a news article through a transport medium that does not support 8bit characters. The MIME content type definition of "application/news-transmission" is: MIME type name: application MIME subtype name: news-transmission Required parameters: none Optional parameters: usage=moderate usage=inject usage=relay Encoding considerations: A transfer-encoding (such as Quoted- Printable or Base64) different from that of the article transmitted MAY be supplied (perhaps en route) to ensure correct transmission over some 7bit transport medium. Security considerations: A news article may be a "control message", which could have effects on the recipient host's system beyond just storage of the article. However, such control messages also occur in normal news flow, so most hosts will already be suitably defended against undesired effects. Published specification: [USEFOR] Body part: A complete article or proto-article, ready for injection into Netnews, or a batch of such articles. NOTE: It is likely that the recipient of an "application/news- transmission" will be a specialised gateway (e.g. a moderator's submission address) able to accept articles with only one of the three usage parameters "moderate", "inject" and "relay", hence the reason why they are optional, being redundant in most situations. Nevertheless, they MAY be used to signify the originator's intention with regard to the transmission, so removing any possible doubt. When the parameter "relay" is used, or implied, the body part MAY be a batch of articles to be transmitted together, in which case the following syntax MUST be used. C. H. Lindsey [Page 62] News Article Format April 2001 batch = 1*( batch-header article ) batch-header = "#!" SP "rnews" SP article-size CRLF article-size = 1*digit where the "rnews" is case-sensitive. Thus a batch is a sequence of articles, each prefixed by a header line that includes its size. The article-size is a decimal count of the octets in the article, counting each CRLF as one octet regardless of how it is actually represented. NOTE: Despite the similarity of this format to an executable UNIX script, it is EXTREMELY unwise to feed such a batch into a command interpreter in anticipation of it running a command named "rnews"; the security implications of so doing would be disastrous. 6.21.6.2. Message/news withdrawn The Content-Type "message/news", as previously registered with IANA, is hereby obsoleted and should be withdrawn. It was never widely implemented, and its default treatment as "application/octet-stream" by agents that did not recognise it was counter productive. The Content-Type "message/rfc822" SHOULD be used in its place, as already described above. 6.22. Obsolete Headers Persons writing new agents SHOULD ignore any former meanings of the following headers: Also-Control See-Also Article-Names Article-Updates 7. Control Messages The following sections document the control messages. "Message" is used herein as a synonym for "article" unless context indicates otherwise. Group control messages are the sub-class of control messages that request some update to the configuration of the groups known to a serving agent, namely "newgroup". "rmgroup", "mvgroup" and "checkgroups", plus any others created by extensions to this standard. All of the group control messages MUST have an Approved header (6.14). Moreover, in those hierarchies where appropriate administrative agencies exist (see 1.1), group control messages Ought Not to be issued except as authorized by those agencies. [They SHOULD also use one of the authentication mechanisms which we may define when we get a Round Tuit.] C. H. Lindsey [Page 63] News Article Format April 2001 The Newsgroups header of each control message MUST include the newsgroup-name(s) for the group(s) affected (i.e. groups to be created, modified or removed, or containing articles to be canceled). This is to ensure that the message progagates to all sites which receive (or would receive) that group(s). It MAY include other newsgroup-names so as to improve propagation (but this practice should be regarded as exceptional rather than normal). The descriptions below are generally phrased in terms suggesting mandatory actions, but any or all of these MAY be subject to local administrative restrictions, and MAY be denied or referred to an administrator for approval (either as a class or on a case-by-case basis). Analogously, where the description below specifies that a message or portion thereof is to be ignored, this action MAY include reporting it to an administrator. Relaying Agents MUST propagate even control messages that they do not understand. In the following sections, each type of control message is defined syntactically by defining its verb, its arguments, and possibly its body. 7.1. The 'newgroup' Control Message newgroup-verb = "newgroup" newgroup-arguments = CFWS newsgroup-name [ CFWS newgroup-flag ] newgroup-flag = "moderated" The "newgroup" control message requests that the specified group be created or changed. The newgroup-flag "moderated" is appended to mark the group as moderated. The absence of this flag marks the group as unmoderated. "Moderated" is the only such flag defined by this standard; other flags MAY be defined for use in cooperating subnets, but newgroup messages containing them MUST NOT be acted on outside of those subnets. NOTE: Specifically, some alternative flags such as "y" and "m", which are sent and recognised by some current software, are NOT part of this standard. Moreover, some existing implementations treat any flag other than "moderated" as indicating an unmoderated newsgroup. Both of these usages are contrary to this standard. The message body comprises or includes a "application/news-groupinfo" (7.1.2) part containing machine- and human-readable information about the group. The newsgroup-name MUST conform to all requirements set out in section 5.5, and it is the responsibility of the newgroup message issuer to ensure this (since some of those requirements are hard to enforce mechanically). Moreover, the newsgroup-name Ought to conform to whatever policies have been established by the administrative agency, if any, for that hierarchy. C. H. Lindsey [Page 64] News Article Format April 2001 The newgroup command is also used to update the newsgroups-line or the moderation status of a group. 7.1.1. The Body of the 'newgroup' Control Message The body of the newgroup message contains the following subparts, preferably in the order shown: 1. An "application/news-groupinfo" part (7.1.2) containing the name and newsgroups-line of the group(s). This part MUST be present. 2. Other parts containing useful information about the background of the newsgroup message (typically of type "text/plain"). 3. Parts containing initial articles for the newsgroup. See section 7.1.3 for details. In the event that there is only the single (i.e. application/news- groupinfo) subpart present, it will suffice to include a "Content- Type: application/news-groupinfo" amongst the headers of the control message. Otherwise, a "Content-Type: multipart/mixed header" will be needed, and each separate part will then need its own Content-Type header. 7.1.2. Application/news-groupinfo The "application/news-groupinfo" body part contains brief information about a newsgroup, i.e. the group's name, it's newsgroup-description and the moderation-flag. NOTE: The presence of the newsgroups-tag "For your newsgroups file:" is intended to make the whole newgroup message compatible with current practice as described in [Son-of-1036]. The MIME content type definition of "application/news-groupinfo" is: MIME type name: application MIME subtype name: news-groupinfo Required parameters: none Disposition: by default, inline Encoding considerations: "7bit" or "8bit" is sufficient and MUST be used to maintain compatibility. Security considerations: this type MUST NOT be used except as part of a control message for the creation or modification of a Netnews newsgroup Published specification: [USEFOR] The content of the "application/news-groupinfo" body part is defined as: groupinfo-body = [ newsgroups-tag CRLF ] 1*( newsgroups-line CRLF ) C. H. Lindsey [Page 65] News Article Format April 2001 newsgroups-tag = %x46.6F.72 SP %x79.6F.75.72 SP %x6E.65.77.73.67.72.6F.75.70.73 SP %x66.69.6C.65.3A ; case sensitive ; "For your newsgroups file:" newsgroups-line = newsgroup-name [ 1*HTAB newsgroup-description ] [ 1*WSP moderation-flag ] newsgroup-description = 1*( [WSP] utext) moderation-flag = %x28.4D.6F.64.65.72.61.74.65.64.29 ; case sensitive "(Moderated)" The whole groupinfo-body is intended to be interpreted as a text written in the UTF-8 character set. The "application/news-groupinfo" is used in conjunction with the "newgroup" (7.1) and "mvgroup" (7.3) control messages. The newsgroup-name(s) in the newsgroups-line MUST agree with the newsgroup-name(s) in the "newgroup" or "mvgroup" control message (and thus there cannot be more than a single newsgroups-line except in the case of a "mvgroup" control message affecting a whole (sub)hierarchy). The Content-Type "application/news-groupinfo" MUST NOT be used except as a part of such control messages. Although optional, the newsgroups-tag SHOULD be included until such time as this standard has been widely adopted, to ensure compatibility with present practice. Moderated newsgroups MUST be marked by appending the case sensitive text " (Moderated)" at the end. It is NOT recommended that the moderator's email address be included in the newsgroup-description as has sometimes been done. Although, in accordance with [MESSFOR] and section 4.5 of this standard, a newsgroups-line could have a maximum length of 998 octets, as a matter of policy a far lower limit, expressed in characters, Ought to be set. The current convention is to limit its length so that the newsgroup-name, the HTAB(s) (interpreted as 8- character tabs that takes one at least to column 24) and the newsgroup-description (excluding any moderation-flag) fit into 79 characters. However, this standard does not seek to enforce any such rule, and reading agents SHOULD therefore enable a newsgroups-line of any length to be displayed, e.g. by wrapping it as required. NOTE: The newsgroups-line is intended to provide a brief description of the newsgroup, written in the UTF-8 character set. Since newsgroup-names are required to be expressed in UTF-8 when they appear in headers, and since [NNTP] requires the use of UTF-8 when such a description is transmitted by the LIST NEWSGROUPS command, it would also be convenient for servers that keep a "newsgroups" file to store them in that form, so as to avoid unnecessary conversions. C. H. Lindsey [Page 66] News Article Format April 2001 7.1.3. Initial Articles Some subparts of a "newgroup" or "mvgroup" control message MAY contain an initial set of articles to be posted to the affected newsgroup(s) as soon as it has been created. These parts are identified by having the Content-Type "application/news- transmission", possibly with the parameter "usage=inject". The body of each such part should be a complete proto-article, ready for posting. This feature is intended for the posting of charters, initial FAQs and the like to the newly formed group(s). The Newsgroups header of the proto-article MUST include the newsgroup-name of the newly created group (or one of them, if more than one). It MAY include other newsgroup-names. If the proto-article includes a Message-ID header, the message indentifier in it MUST be different from that of any existing article and from that of the control message as a whole, though it MAY be derived from it by appending "$p=", where is an integer part number (see also 6.15.2.1), immediately after its id-left-side (i.e. before the "@"). Alternatively such a message identifier MAY be derived by the injecting agent when the proto-article is posted. The proto-article SHOULD include the header "Distribution: local". The proto-article SHOULD be injected at the serving agent that processes the control message AFTER the newsgroup(s) in question has been created. It MUST NOT be injected if the newsgroup is not, in fact, created (for whatever reason). It MUST NOT be submitted to any relaying agent for transmission beyond the server(s) upon which the newsgroup creation has just been effected (in other words, it is to be treated as having a "Distribution: local" header, whether such a header is actually present or not). NOTE: The "$p=" convention, if applied uniformly, should ensure that initial articles relayed beyond the local server in contravention of the above prohibition will not propagate in competition with similar copies injected at other local servers. NOTE: It is not precluded that the proto-article is itself a control message or other type of special article, to be activated only upon creation of the new newsgroup. However, except as might arise from that possibility, any "application/news-transmission" within some nested "multipart/*" structure within the proto-article is not to be activated. 7.1.4. Example A "newgroup" with bilingual charter and policy information: From: "example.all Administrator" Newsgroups: example.admin.groups,example.admin.announce Date: 27 Feb 1997 12:50:22 +0200 Subject: cmsg newgroup example.admin.info moderated Approved: admin@example.invalid Control: newgroup example.admin.info moderated C. H. Lindsey [Page 67] News Article Format April 2001 Message-ID: Content-Type: multipart/mixed; boundary="nxtprt" Content-Transfer-Encoding: 8bit This is a MIME control message. --nxtprt Content-Type: application/news-groupinfo For your newsgroups file: example.admin.info About the example.* groups (Moderated) --nxtprt Content-Type: application/news-transmission Newsgroups: example.admin.info From: "example.all Administrator" Subject: Charter for example.admin.info Message-ID: Distribution: local Content-Type: multipart/alternative ; differences = content-language ; boundary = nxtlang --nxtlang Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Language: en The group example.admin.info contains regularly posted information on the example.* hierarchy. --nxtlang Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Content-Language: de Die Gruppe example.admin.info enthaelt regelmaessig versandte Informationen ueber die example.*-Hierarchie. --nxtlang-- --nxtprt-- 7.2. The 'rmgroup' Control Message rmgroup-verb = "rmgroup" rmgroup-arguments = CFWS newsgroup-name The "rmgroup" control message requests that the specified group be removed from the list of valid groups. The Content-Type of the body is unspecified; it MAY contain anything, usually an explanatory text. NOTE: It is entirely proper for a serving agent to retain the group until all the articles in it have expired, provided that it ceases to accept new articles. C. H. Lindsey [Page 68] News Article Format April 2001 7.2.1. Example Plain "rmgroup": From: "example.all Administrator" Newsgroups: example.admin.groups, example.admin.announce Date: 4 Jul 1997 22:04 -0900 (PST) Subject: cmsg rmgroup example.admin.obsolete Message-ID: Approved: admin@example.invalid Control: rmgroup example.admin.obsolete The group example.admin.obsolete is obsolete. Please remove it from your system. 7.3. The 'mvgroup' Control Message mvgroup-verb = "mvgroup" mvgroup-arguments = CFWS ( mvgrp-groups / mvgrp-hrchy ) mvgrp-groups = newsgroup-name CFWS newsgroup-name [ CFWS newgroup-flag ] mvgrp-hrchy = groupnamepart ".*" CFWS groupnamepart ".*" groupnamepart = newsgroup-name ; syntactically 7.3.1. Single group The "mvgroup" control message requests that the first specified group be moved to the second specified group. The message body MUST contain a "application/news-groupinfo" (7.1.2) containing machine- and human-readable information about the new group, and possibly other subparts as for a newgroup control message. When this message is received, the new group SHOULD be created (and MUST be moderated if a newgroup-flag "moderated" is present) and all existing articles SHOULD be copied or moved to the new group; then the old, now empty group SHOULD be removed. If the old group does not exist, the message is ignored unless the new group does not exist either, in which case the message SHOULD be treated as if it had been an equivalent "newgroup" message. If both groups exist, the groups MAY be "merged". If this is done, it MUST be done correctly, i.e. implementations MUST take care that the messages in the group being deleted are renumbered accordingly to avoid overwriting articles in one group with those of the other, and that crossposted articles do not appear twice. Otherwise, the old group is just removed. NOTE: Due to the severe difficulties of implementing this merging, those proposing to merge existing groups using this control message should be aware that it may not be implemented on many (if not most) sites, and should therefore be prepared for such disruption as may ensue. C. H. Lindsey [Page 69] News Article Format April 2001 An indication that the old group was replaced by the new group MAY be retained by the serving agent so that continuity of service may be maintained, and clients made aware of the new arrangements. NOTE: Some serving agents that use an "active" file permit an entry of the form "oldgroup xxx yyy =newgroup", which enables any articles arriving for oldgroup to be diverted to newgroup, and could even enable users already subscribed to oldgroup to receive articles from newgroup instead. In all cases, the information conveyed in the "application/news- groupinfo" body part is applied to the new group. Until most serving agents conform to this standard, whenever a mvgroup control message for a single group is issued, a corresponding pair of rmgroup and newgroup control messages SHOULD be issued a few days later. 7.3.2. Multiple Groups If the two names ends with the character sequence ".*", the newgroup message requests that a whole (sub)hierarchy be moved. The same procedure as for single groups (7.3.1) applies to each matched group, except that the moderation status of each old group MUST be copied to the corresponding new group. To avoid recursion, the new groups' names MUST NEVER match the old groups' name pattern; i.e., moving a whole (sub)hierarchy to a subhierarchy of the original hierarchy is explicitly disallowed. Until most serving agents conform to this standard, whenever a mvgroup control message for multiple groups is issued, a corresponding set of rmgroup and newgroup control messages for all the affected groups SHOULD be issued a few days later. 7.3.3. Examples Plain "mvgroup": From: "example.all Administrator" Newsgroups: example.admin.groups, example.admin.announce Date: 30 Jul 1997 22:04 -0500 (EST) Subject: cmsg mvgroup example.oldgroup example.newgroup moderated Message-ID: Approved: admin@example.invalid Control: mvgroup example.oldgroup example.newgroup moderated Content-Type: multipart/mixed; boundary=nxt --nxt Content-Type: application/newgroupinfo For your newsgroups file: C. H. Lindsey [Page 70] News Article Format April 2001 example.newgroup The new replacement group (Moderated) --nxt The moderated group example.oldgroup is replaced by example.newgroup. Please update your configuration. --nxt-- More complex "mvgroup" for a whole hierarchy: The charter of the group example.talk.jokes contained a reference to example.talk.jokes.d, which is also being moved. So the charter is updated. From: "example.all Administrator" Newsgroups: example.admin.groups, example.admin.announce Date: 30 Jul 1997 22:04 -0500 (EST) Subject: cmsg mvgroup example.talk.* example.conversation Message-ID: Approved: admin@example.invalid Control: mvgroup example.talk.* example.conversation Content-Type: multipart/mixed; boundary=nxt --nxt Content-Type: application/news-groupinfo For your newsgroups file: example.conversation.boring Boring conversations example.conversation.better Better conversations example.conversation.jokes Funny stuff example.conversation.jokes.d Discussion of funny stuff --nxt Content-Type: application/news-transmission Newsgroups: example.conversation.jokes From: "example.all Administrator" Subject: Charter for renamed group example.conversation.jokes Distribution: local Message-ID: This group is to publish jokes and other funny stuff. Discussions about the articles posted here should be redirected to example.conversation.jokes.d; adding a Followup-To: header is recommended. --nxt-- 7.4. The 'checkgroups' Control Message The "checkgroups" control message contains a list of all the valid groups in a complete hierarchy. C. H. Lindsey [Page 71] News Article Format April 2001 checkgroup-verb = "checkgroups" checkgroup-arguments= [ chkscope ] [ chksernr ] chkscope = 1*( CFWS ["!"] newsgroup-name ) chksernr = CFWS "#" 1*DIGIT The chkscope parameter(s) specifies the (sub)hierarchy(s) for which this "checkgroups" message applies. The chksernr parameter is a serial number, which can be any positive integer (e.g. just numbered or the date in YYYYMMDD). It SHOULD increase by an arbitrary value with every change to the group list and MUST NOT ever decrease. NOTE: This was added to circumvent security problems in situations where the Date header cannot be authenticated. Example: Control: checkgroups de !de.alt #248 NOTE: Some existing software does not support the "chkscope" parameter. Thus a "checkgroups" message SHOULD also contain the groups of other subhierarchies the sender is not responsible for. "New" software MUST ignore groups which do not fall into the scope of the "checkgroups" message. If no scope for the checkgroups message is given, it applies to all hierarchies for which group statements appear in the message. The body of the message has the Content-Type "application/news- checkgroups". It asserts that the newsgroups it lists are the only newsgroups in the specified hierarchies. NOTE: The checkgroups nessage is intended to synchronize the list of newsgroups stored by a serving agent, and their newsgroup-descriptions, with the lists stored by other serving agents throughout the network. However, it might be inadvisable for the serving agent actually to create or delete any newsgroups without first obtaining the approval of its administrators for such proposed actions. 7.4.1. Application/news-checkgroups The "application/news-checkgroups" body part contains a complete list of all the newsgroups in a hierarchy, their newsgroup-descriptions and their moderation status. The MIME content type definition of "application/news-checkgroups" is: MIME type name: application MIME subtype name: news-checkgroups Required parameters: none Disposition: by default, inline Encoding considerations: "7bit" or "8bit" is sufficient and MUST be used to maintain compatibility. C. H. Lindsey [Page 72] News Article Format April 2001 Security considerations: this type MUST NOT be used except as part of a checkgroups control message The content of the "application/news-checkgroups" body part is defined as: checkgroups-body = *( valid-group CRLF ) valid-group = newsgroups-line ; see 7.1.2 The whole checkgroups-body is intended to be interpreted as a text written in the UTF-8 character set. The "application/news-checkgroups" content type is used in conjunction with the "checkgroups" control message (7.4). NOTE: The possibility of removing a complete hierarchy by means of an "invalidation" line beginning with a '!' is no longer provided by this standard. The intent of the feature was widely misunderstood and it was misused more often than it was used correctly. The same effect, if required, can now be obtained by the use of an appropriate chkscope argument in conjunction with an empty checkgroups-body. 7.5. Cancel The cancel message requests that a target article be "canceled" i.e. be withdrawn from circulation or access. A cancel message may be issued in the following circumstances. 1. The poster of an article (or, more specifically, any entity mentioned in the From header or the Sender header, whether or not that entity was the actual poster) is always entitled to issue a cancel message for that article, and serving agents SHOULD honour such requests. Posting agents SHOULD facilitate the issuing of cancel messages by posters fulfilling these criteria. 2. The agent which injected the article onto the network (more specifically, the entity identified by the path-identity in front of the leftmost '%' delimeter in the Path header (5.6) or in the Injector-Info header (6.19) and, where appropriate, the moderator (more specifically, any entity mentioned in the Approved header) is always entitled to issue a cancel message for that article, and serving agents SHOULD honour such requests. 3. Other entities MAY be entitled to issue a cancel message for that article, in circumstances where established policy for any hierarchy or group in the Newsgroup header, or established custom within Usenet, so allows (such policies and customs are not defined by this standard). Such cancel messages MUST include an Approved header identifying the responsible entity. Serving agents MAY honour such requests, but SHOULD first take steps to verify their appropriateness. [I think that accords with the accepted norms for 1st, 2nd and 3rd party cancels (or is a moderator a 1st party?). Observe the use of an Approved header in place of the present X-Cancelled-By (I cannot see that we need C. H. Lindsey [Page 73] News Article Format April 2001 a new header for that when Approved is available). The definitions given are sufficient to establish which category a cancel was in, assuming that nobody told any lies, and to establish who had committed abuse otherwise. So far so good, but we now need authentication methods on top of all that.] [A future draft of this standard may contain provisions for a Cancel- Lock header to enable verification of the authenticity of 1st (and even 2nd) party cancels, and means for digital signatures to establish the authenticity of 3rd party cancels.] [A future draft of this standard may also contain provision for a "block cancel" message, with a list of messages to be canceled contained in its body rather than in the headers. Whether this needs to have a Control header at all, and whether the existing "nocem-on-spool" is adequate for this purpose, and indeed whether NOCEM as such should be part of this, or some other, standard are issues that are yet to be addressed.] cancel-verb = "cancel" cancel-arguments = CFWS message-id The argument identifies the article to be cancelled by its message identifier. The body SHOULD contain an indication of why the cancellation was requested. The cancel message SHOULD be posted to the same newsgroup(s), with the same distribution(s), as the article it is attempting to cancel. A serving agent that elects to honour a cancel message SHOULD delete the target article completely and immediately (or at the minimum make the article unavailable for relaying or serving) and also SHOULD reject any copies of this article that appear subsequently. See also sections 8.3 and 8.4. NOTE: The former requirement [RFC 1036] that the From and/or Sender headers of the cancel message should match those of the original article has been removed from this standard, since it only encouraged cancel issuers to conceal their true identity, and it was not usually checked or enforced by canceling software. Therefore, both the From and/or Sender headers and any Approved header should now relate to the entity responsible for issuing the cancel message. 7.6. Ihave, sendme The "ihave" and "sendme" control messages implement a crude batched predecessor of the NNTP [NNTP] protocol. They are largely obsolete on the Internet, but still see use in conjunction with some transort protocols such as UUCP, especially for backup feeds that normally are active only when a primary feed path has failed. There is no requirement for relaying agents that do not support such transport protocols to implement them. C. H. Lindsey [Page 74] News Article Format April 2001 NOTE: The ihave and sendme messages defined here have ABSOLUTELY NOTHING TO DO WITH NNTP, despite similarities of terminology. The two messages share the same syntax: ihave-arguments = *( msg-id SP ) relayer-name sendme-arguments = ihave-arguments relayer-name = path-identity ; see 5.6.1 ihave-body = *( msg-id CRLF ) sendme-body = ihave-body Msg-ids MUST appear in either the arguments or the body, but NOT both. Relayers SHOULD generate the form putting msg-ids in the body, but the other form MUST be supported for backward compatibility. The ihave message states that the named relaying agent has received articles with the specified message identifiers, which may be of interest to the relaying agents receiving the ihave message. The sendme message requests that the agent receiving it send the articles having the specified message identifiers to the named relaying agent. These control messages are normally sent essentially as point-to- point messages, by using newgroups-names in the Newsgroups header of the form "to." followed by one of more components in the form of a relayer-name (see section 5.5.1 which forbids "to" as the first component of a newgroup-name). The control message SHOULD then be delivered ONLY to the relaying agent(s) identitifed by that relayer- name, and any relaying agent receiving such a message which includes its own relayer-name MUST NOT propagate it further. Each pair of relaying agent(s) sending and receiving these messages MUST be immediate neighbors, exchanging news directly with each other. Each relaying agent advertises its new arrivals to the other using ihave messages, and each uses sendme messages to request the articles it lacks. To reduce overhead, ihave and sendme messages SHOULD be sent relatively infrequently and SHOULD contain reasonable numbers of message IDs. If ihave and sendme are being used to implement a backup feed, it may be desirable to insert a delay between reception of an ihave and generation of a sendme, so that a slightly slow primary feed will not cause large numbers of articles to be requested unnecessarily via sendme. 7.7. Obsolete control messages. The following control message verbs are declared obsolete by this standard: sendsys version whogets senduuname C. H. Lindsey [Page 75] News Article Format April 2001 8. Duties of Various Agents The following section sets out the duties of various agents involved in the creation, relaying and serving of Usenet articles. In this section, the word "trusted", as applied to the source of some article, means that an agent processing that article has verified, by some means, the identity of that source (which may be another agent or a poster). NOTE: In many implementations, a single agent may perform various combinations of the injecting, relaying and serving functions. Its duties are then the union of the various duties concerned. 8.1. General principles to be followed There are two important principles that news implementors (and administrators) need to keep in mind. The first is the well-known Internet Robustness Principle: Be liberal in what you accept, and conservative in what you send. However, in the case of news there is an even more important principle, derived from a much older code of practice, the Hippocratic Oath (we will thus call this the Hippocratic Principle): First, do no harm. It is VITAL to realize that decisions which might be merely suboptimal in a smaller context can become devastating mistakes when amplified by the actions of thousands of hosts within a few minutes. In the case of gateways, the primary corollary to this is: Cause no loops. 8.2. Duties of an Injecting Agent An Injecting Agent is responsible for taking a proto-article from a posting agent and either forwarding it to a moderator or injecting it into the relaying system for access by readers. As such, an injecting agent is considered responsible for ensuring that any article it injects conforms with the rules of this standard and the policies of any newsgroups or hierarchies that the article is posted to. It is also expected to bear some responsibility towards the rest of the network for the behaviour of its posters (and provision is therefore made for it to be easily contactable by email). C. H. Lindsey [Page 76] News Article Format April 2001 To this end injecting agents MAY cancel articles which they have previously injected (see 7.5). 8.2.1. Proto-articles A proto-article is one that has been created by a posting agent and has not yet been injected into the news system by an injecting agent. It SHOULD NOT be propagated in that form to other than injecting agents. A proto-article has the same format as a normal article except that some of the following mandatory headers MAY be omitted: Message-Id, Date and Path. These headers MUST NOT contain invalid values; they MUST either be correct or not present at all. A proto-article SHOULD NOT contain the '%' delimiter in any Path header, except in the rare cases where an article gets injected twice. It MAY contain path-identities with other delimiters in the pre-injection portion of the Path header (5.6.3). 8.2.2. Procedure to be followed by Injecting Agents A injecting agent receives proto-articles from posting and followup agents. It verifies them, adds headers where required and then either forwards them to a moderator or injects them by passing them to serving or relaying agents. If an injecting agent receives an otherwise valid article that has already been injected it SHOULD either act as if it is a relaying agent or else pass the article on to a relaying agent completely unaltered. Exceptionally, it MAY reinject the article, perhaps as a part of some complex gatewaying process (in which case it will add a second '%' delimiter to the Path header). It MUST NOT forward an already injected article to a moderator. An injecting agent processes articles as follows: 1. It MUST remove any Injector-Info or Complaints-To header already present (though it might be useful to copy them to suitable X- headers). It SHOULD likewise remove any NNTP-Posting-Host or other undocumented tracing header. 2. It SHOULD verify that the article is from a trusted source. However, it MAY allow articles in which headers contain "forged" email addresses, that is, addresses which are not valid for the known and trusted source, especially if they end in ".invalid". 3. It MUST reject any article whose Date header is more than 24 hours into the past or into the future (cf. 5.1). 4. It MUST reject any article that does not have the correct mandatory headers for a proto-article (5 and 8.2.1) present, or which contains any header that does not have legal contents. C. H. Lindsey [Page 77] News Article Format April 2001 5. If the article is rejected, or is otherwise incorrectly formatted or unacceptable due to site policy, the posting agent MUST be informed (such as via an NNTP 44x response code) that posting has failed and the article MUST NOT then be processed further. 6. The Message-ID and Date headers (and their content) MUST be added when not already present. 7. A Path header with a tail-entry (5.6.3) MUST be correctly added if not already present (except that it SHOULD NOT be added if the article is to be forwarded to a moderator). 8. The path-identity of the injecting agent with a '%' delimiter (5.6.2) MUST be prepended to the Path header; moreover, that path-identity MUST be an FQDN mailable address (5.6.2). 9 An Injector-Info header (6.19) SHOULD be added, identifying the trusted source of the article, and a suitable Complaints-To header (6.20) MAY be added (except that these two headers SHOULD NOT be added if the article is to be forwarded to a moderator). 10.The injecting agent MAY add other headers not already provided by the poster, but SHOULD NOT alter, delete or reorder any headers already present in the article (except for headers intended for tracing purposes, such as Injector-Info and Complaints-To, as already mentioned). The injecting agent MUST NOT alter the body of the article in any way. 11.If the Newsgroups line contains one or more moderated groups and the article does NOT contain an Approved header, then the injecting agent MUST forward it to the moderator of the first (leftmost) moderated group listed in the Newsgroups line via email. The complete article SHOULD be encapsulated (headers and all) within the email, preferably using the Content-Type "application/news-transmission" (6.21.6.1). 12.Otherwise, the injecting agent forwards the article to one or more relaying or serving agents. 8.3. Duties of a Relaying Agent A Relaying Agent accepts injected articles from injecting and other relaying agents and passes them on to relaying or serving agents according to mutually agreed policy. Relaying agents SHOULD accept articles ONLY from trusted agents. A relaying agent processes articles as follows: 1. It MUST verify the leftmost entry in the Path header and then prepend its own path-identity with a '/' delimiter, and possibly also the verified path-identity of its source with a '?' delimiter (5.6.2). C. H. Lindsey [Page 78] News Article Format April 2001 2. It MUST reject any article whose Date header is stale (see 5.1). 3. It MUST reject any article that does not have the correct mandatory headers (section 5) present with legal contents. 4. It SHOULD reject any article whose optional headers (section 6) do not have legal contents. 5. It SHOULD reject any article that has already been sent to it (a database of message identifiers of recent messages is usually kept and matched against). 6. It SHOULD reject any article that has already been Canceled, Superseded or Replaced by its author or by another trusted entity. 7. It MAY reject any article without an Approved header posted to newsgroups known to be moderated (this practice is strongly recommended, but the information necessary to do it may not be available to all agents). 8. It then passes articles which match mutually agreed criteria on to neighboring relaying and serving agents. However, it SHOULD NOT forward articles to sites whose path-identity is already in the Path header. NOTE: It is usual for relaying and serving agents to restrict the Newsgroups, Distributions, age and size of articles that they wish to receive. If the article is rejected as being invalid, unwanted or unacceptable due to site policy, the agent that passed the article to the relaying agent SHOULD be informed (such as via an NNTP 43x response code) that relaying failed. In order to prevent a large number of error messages being sent to one location, relaying agents MUST NOT inform any other external entity that an article was not relayed UNLESS that external entity has explicitly requested that it be informed of such errors. NOTE: In order to prevent overloading, relaying agents should not routinely query an external entity (such as a DNS-server) in order to verify an article (though a local cache of the required information might usefully be consulted). Relaying agents MUST NOT alter, delete or rearrange any part of an article expect for the Path and Xref Headers. 8.4. Duties of a Serving Agent A Serving Agent takes an article from a relaying or injecting agent and files it in a "news database". It also provides an interface for reading agents to access the news database. This database is normally indexed by newsgroup with articles in each newsgroup identified by an article-locater (usually in the form of a decimal number - see 6.16). C. H. Lindsey [Page 79] News Article Format April 2001 NOTE: Since control messages are often of interest, but should not be displayed as normal articles in regular newsgroups, it is common for serving agents to make them available in a pseudo- newsgroup named "control" or in a pseudo-newsgroup in a sub- hierarchy under "control." (e.g. "control.cancel"). A serving agent processes articles as follows: 1. It MUST verify the leftmost entry in the Path header and then prepend its own path-identity with a '/' delimiter, and possibly also the verified path-identity of its source with a '?' delimiter (5.6.2). 2. It MUST reject any article whose Date header is stale (see 5.1). 3. It MUST reject any article that does not have the correct mandatory headers (section 5) present, or which contains any header that does not have legal contents. 4. It SHOULD reject any article that has already been sent to it (a database of message identifiers of recent messages is usually kept and matched against). 5. It SHOULD reject any article that has already been Canceled, Superseded or Replaced by its author or by another trusted entity, and delete any of such article that it already has in its news database. 6. It MUST reject any article without an Approved header posted to any moderated newsgroup which it is configured to receive, and it MAY reject such articles for any newsgroup it knows be moderated. 7. It SHOULD generate a correct Xref header (6.16) for each article. 8. Finally, it stores the article in its news database. 8.5. Duties of a Posting Agent A Posting Agent is used to assist the poster in creating a valid proto-article and forwarding it to an injecting agent. Postings agents SHOULD ensure that proto-articles they create are valid Netnews articles according to this standard and other applicable policies. Posting agents meant for use by ordinary posters SHOULD reject any attempt to post an article which cancels, Supersedes or Replaces another article of which the poster is not the author. 8.6. Duties of a Followup Agent A Followup Agent is a special case of a posting agent and as such is bound by all the posting agent's requirements plus additional ones. Followup agents MUST create valid followups, in particular by C. H. Lindsey [Page 80] News Article Format April 2001 providing correctly adjusted forms of those headers described as inheritable (4.2.2.2), notably the Newgroups header (5.5), the Subject header (5.4) and the References header (6.10), and they Ought to observe appropriate quoting conventions in the body (see 4.3.2). Followup agents SHOULD initialize the Newsgroups header from the precursor's Followup-To header, if present, when preparing a followup; however posters MAY then change this before posting if they wish. Followup agents MUST NOT attempt to send email to any address ending in ".invalid". Followup agents SHOULD NOT email copies of the followup to the author of the precursor unless this has been explicitly requested by means of a Mail-Copies-To header (6.8), but they SHOULD include a Posted-And-Mailed header (6.9) whenever a copy is so emailed. 8.7. Duties of a Moderator A Moderator receives news articles by email, decides whether to accept them and, if so, either injects them into the news stream or forwards them to further moderators. A moderator processes an article, as submitted to any newsgroup that he moderates, as follows: 1. He decides, on the basis of whatever moderation policy applies to his group, whether to accept or reject the article. He MAY do this manually, or else partially or wholly with the aid of appropriate software for whose operation he is then responsible. He MAY modify the article if that is in accordance with the applicable moderation policy (and in particular he MAY remove redundant headers and add Comments and other informational headers). He MAY inform the poster as to whether the article has been accepted or rejected. If the article is rejected, then it fails for all the newsgroups for which it was intended (in particular the moderator SHOULD NOT resubmit the article, with a reduced Newsgroups header, to any remaining groups, especially if this will break any authentication checks present in the article). If the article is accepted, the moderator proceeds with the following steps. 2. The Date header SHOULD be retained, except that if it is stale (5.1) for reasons understood by the moderator (e.g. delays in the moderation process) he MAY substitute the current date (but must then take responsibility for any loops that ensue). Any local headers (4.2.2.3) or variant headers (4.2.2.4) MUST be removed, except that a Path header MAY be truncated to only its pre- injection region (5.6.3). Any Injector-Info header (6.19) or Complaints-To header (6.20) MUST be removed. [Note several differences from Kent Landfield's 'Moderator's Handbook'. The original Date and Message-ID are retained. Any Distribution header is retained. C. H. Lindsey [Page 81] News Article Format April 2001 Any Sender header is retained. Various other minor headers are retained (though the moderator MAY, of course, remove them. ] 3. He adds an Approved header (6.14) containing a mailbox identifying himself (or, if the article already contains an Approved header from another moderator, he adds that identifying information to it). He MAY also add further headers to authenticate that the article has been properly approved. [That can be strengthened when we have defined proper authentication mechanisms.] 4. If the Newsgroups header contains further moderated newsgroups for which approval has not already been given, he forwards the article to the moderator of the leftmost such group (which, if this standard has been followed correctly, will always be the group immediately to the right of the group(s) for which he is responsible). However, he MUST NOT alter the order in which the newsgroups are listed in the Newsgroups header. 5. Otherwise, he causes the article to be injected, having first observed all the duties of a posting agent (8.5). NOTE: This standard does not prescribe how the moderator or moderation policy for each newsgroup is established; rather it assumes that whatever agencies are responsible for the relevant network or hierarchy (1.1) will have made appropriate arrangements in that regard. It SHOULD be the case that articles will be received by the moderator encapsulated as an object of Content-Type application/news- transmission (8.2.2), or possibly encapsulated but without an explicit Content-Type header. In such a case, the complete article is immediately available for processing by the moderator. However, prior to the introduction of this standard, it was more common for injecting agents to transform proto-articles into mail messages, mixing the Netnews headers with the Mail headers. Moderators SHOULD therefore be prepared to accept submission in this format, although they need then to be aware of the Duties of an Incoming Gateway (8.8.2) (and, in particular, they SHOULD adopt the Message-ID and Date headers of the mail message, though they SHOULD NOT add any Sender header). 8.8. Duties of a Gateway A Gateway transforms an article into the native message format of another medium, or translates the messages of another medium into news articles. Encapsulation of a news article into a message of MIME type application/news-transmission, or the subsequent undoing of that encapsulation, is not gatewaying, since it involves no transformation of the article. C. H. Lindsey [Page 82] News Article Format April 2001 There are two basic types of gateway, the Outgoing Gateway that transforms a news article into a different type of message, and the Incoming Gateway that transforms a message from another medium into a news article and injects it into a Netnews system. These are handled separately below. The primary dictat for a gateway is: Above all, prevent loops. Transformation of an article into another medium stands a very high chance of discarding or interfering with the protection inherent in the news system against duplicate articles. The most common problem caused by gateways is "spews," gateway loops that cause previously posted articles to be reinjected repeatedly into Usenet. To prevent this, a gateway MUST take precautions against loops, as detailed below. If bidirectional gatewaying (both an incoming and an outgoing gateway) is being set up between Netnews and some other medium, the incoming and outgoing gateways SHOULD be coordinated to avoid reinjection of gated articles. Circular gatewaying (gatewaying a message into another medium and then back into Netnews) SHOULD NOT be done; encapsulation of the article SHOULD be used instead where this is necessary. A second general principal of gatewaying is that the transformations applied to the message SHOULD be as minimal as possible while still accomplishing the gatewaying. Every change made by a gateway potentially breaks a property of one of the media or loses information, and therefore only those transformations made necessary by the differences between the media should be applied. It is worth noting that safe bidirectional gatewaying between a mailing list and a newsgroup is far easier if the newsgroup is moderated. Posts to the moderated group and submissions to the mailing list can then go through a single point that does the necessary gatewaying and then sends the message out to both the newsgroup and the mailing list at the same time, eliminating most of the possibility of loops. Bidirectional gatewaying between a mailing list and an unmoderated newsgroup, in contrast, is difficult to do correctly and is far more fragile. Newsgroups intended to be bidirectionally gated to a mailing list SHOULD therefore be moderated where possible, even if the moderator is a simple gateway and injecting agent that correctly handles crossposting to other moderated groups and otherwise passes all traffic. 8.8.1. Duties of an Outgoing Gateway From the perspective of Netnews, an outgoing gateway is just a special type of reading agent. The exact nature of what the outgoing gateway will need to do to articles depends on the medium to which C. H. Lindsey [Page 83] News Article Format April 2001 the articles are being gated. The operation of the outgoing gateway is only subject to additional constraints in the presence of one or more corresponding incoming gateways back from that medium to Netnews, since this opens the possibility of loops. It is recommended, however, that the following practices be followed by all outgoing gateways regardless of whether there is known to be a related incoming gateway, both as a precautionary measure and as a guideline to quality of implementation. 1. Only the minimal necessary changes should be made, as stated above. 2. The message identifier of the news article should be preserved if at all possible, preferably as or within the corresponding unique identifier of the other medium, but if not at least as a comment in the message. This helps greatly with preventing loops. 3. The Date of the news article should also be preserved if possible, for similar reasons. 4. The message should be tagged in some way so as to prevent its reinjection into Netnews. This may be impossible to do without knowledge of potential incoming gateways, but it is better to try to provide some indication even if not successful; at the least, a human-readable indication that the article should not be gated back to Netnews can help locate a human problem. 5. News control messages should not be gated to another medium unless they would somehow be meaningful in that medium. 8.8.2. Duties of an Incoming Gateway The incoming gateway has the serious responsibility of ensuring that all of the requirements of this standard are met by the articles that it forms. In addition to its special duties as a gateway, it bears all of the duties and responsibilities of an injecting agent as well, and additionally has the same responsibility of a relaying agent to reject articles that it has already gatewayed. An incoming gateway MUST NOT gate the same message twice. It may not be possible to ensure this in the face of mangling or modification of the message, but at the very least a gateway, when given a copy of a message it has already gated identical except for trace headers (like Received in e-mail or Path in Netnews) MUST NOT gate the message again. An incoming gateway SHOULD take precautions against having this rule bypassed by modifications of the message that can be anticipated. News articles prepared by gateways MUST be legal news articles. In particular, they MUST include all of the mandatory headers and MUST fully conform to the restrictions on said headers. This often requires that a gateway function not only as a relaying agent, but also partly as a posting agent, aiding in the synthesis of a C. H. Lindsey [Page 84] News Article Format April 2001 conforming article from non-conforming input. Incoming gateways MUST NOT pass control messages (articles containing a Control header) without removing or renaming that header. Gateways MAY, however, generate their own cancel messages, under the general allowance for injecting agents to cancel their own messages (7.5). If a gateway receives a message that it can determine is a valid equivalent of a cancel message in the medium it is gatewaying, it SHOULD discard that message without gatewaying it, generate a corresponding cancel message of its own, and inject that cancel message. Incoming gateways MUST NOT inject control messages other than cancels. Encapsulation SHOULD be used instead of gatewaying, when direct posting is not possible or desirable. NOTE: It is not unheard of for mail-to-news gateways to be used to post control messages, but encapsulation should be used for these cases instead. Gateways by their very nature are particularly prone to loops. Spews of normal articles are bad enough; spews of control messages with special significance to the news system, possibly resulting in high processing load or even e-mail sent for every message received, are catastrophic. It is far preferable to construct a system specifically for posting control messages that can do appropriate consistency checks and authentication of the originator of the message. If there is a message identifier that fills a role similar to that of the Message-ID header in news, it SHOULD be used in the formation of the message identifier of the news article, perhaps with transformations required to meet the uniqueness requirement of Netnews. This transformation SHOULD be designed so that two messages with the same identifier generate the same Message-ID header. NOTE: Message identifiers play a central role in the prevention of duplicates, and their correct use by gateways will do much to prevent loops. Netnews does, however, require that message identifiers be unique, and therefore message identifiers from other media may not be suitable for use without modification. A balance must be struck by the gateway between preserving information used to prevent loops and generating unique message identifiers. Exceptionally, if there are multiple incoming gateways for a particular set of messages, each to a different newsgroup(s), each one SHOULD generate a message identifier unique to that gateway. Each incoming gateway nonetheless MUST ensure that it does not gate the same message twice. NOTE: Consider the example of two gateways of a given mailing list into the world-wide Usenet newsgroups, both of which preserve the mail message identifier. Each newsgroup may then receive a portion of the messages (different sites seeing different portions). In these cases, where there is no one C. H. Lindsey [Page 85] News Article Format April 2001 "official" gateway, some other method of generating message identifiers has to be used to avoid collisions. It would obviously be preferable for there to be only one gateway which crossposts, but this may not be possible to coordinate. If no date information is available, the gateway MAY supply a Date header with the gateway's current date. If only partial information is available (e.g. date but not time), this SHOULD be fleshed out to a full Date header by adding default values rather than discarding this information. Only in very exceptional circumstances should Date information be discarded, as it plays an important role in preventing reinjection of old messages. An incoming gateway MUST add a Sender header to the news article it forms containing the mailbox of the administrator of the gateway. Problems with the gateway may be reported to this address. The display-name portion of this mailbox SHOULD indicate that the entity responsible for injection of the message is a gateway. If the original message already had a Sender header, it SHOULD be renamed so that its contents can be preserved. 8.8.3. Example To illustrate the type of precautions that should be taken against loops, here is an example of the measures taken by one particular combination of mail-to-news and news-to-mail gateways at Stanford University designed to handle bidirectional gatewaying between mailing lists and unmoderated groups. 1. The news-to-mail gateway preserves the message identifier of the news article in the generated mail message. The mail-to-news gateway likewise preserves the mail message identifier provided that it is syntactically valid for Netnews. This allows the news system's built-in suppression of duplicates to serve as the first line of defense against loops. 2. The news-to-mail gateway adds an X-Gateway header to all messages it generates. The mail-to-news gateway discards any incoming messages containing this header. This is robust against mailing list managers that replace the message identifier, and against any number of mail hops, provided that the other message headers are preserved. 3. The mail-to-news gateway inserts the host name from which it received the mail message in the pre-injection region of the Path (5.6.3). The news-to-mail gateway refuses to gateway any message that contains the list server name in the pre-injection region of its Path header. This is robust against any amount of munging of the message headers by the mailing list, provided that the mail only goes through one hop. 4. The mail-to-news gateway is designed never to generate bounces to the envelope sender. Instead, articles that are rejected by the news server (for reasons not warranting silent discarding of the C. H. Lindsey [Page 86] News Article Format April 2001 message) result in a bounce message sent to an errors address known not to forward to any mailing lists, so that they can be handled by the news administrators. These precautions have proven effective in practice at preventing loops for this particular application (bidirectional gatewaying between mailing lists and locally distributed newsgroups where both gateways can be designed together). General gatewaying to world-wide newsgroups poses additional difficulties; one must be very wary of strange configurations, such as a newsgroup gated to a mailing list which is in turn gated to a different newsgroup. 9. Security and Related Considerations There is no security. Don't fool yourself. Usenet is a prime example of an Internet Adhocratic-Anarchy; that is, an environment in which trust forms the basis of all agreements. It works. 9.1. Leakage Articles which are intended to have restricted distribution are dependent on the goodwill of every site receiving them. The "Archive: no" header is available as a signal to automated archivers not to file an article, but that cannot be guaranteed. The Distribution header makes provision for articles which should not be propagated beyond a cooperating subnet. The key security word here is "cooperating". When a machine is not configured properly, it may become uncooperative and tend to distribute all articles. The flooding algorithm is extremely good at finding any path by which articles can leave a subnet with supposedly restrictive boundaries, and substantial administrative effort is required to avoid this. Organizations wishing to control such leakage are strongly advised to designate a small number of official gateways to handle all news exchange with the outside world (however, making such gateways too restrictive can also encourage the setting up of unofficial paths which can be exceedingly hard to track down). The sendme control message (7.6), insofar as it is still used, can be used to request articles with a given message identifier, even one that is not supposed to be supplied to the requestor. 9.2. Attacks 9.2.1. Denial of Service The proper functioning of individual newsgroups can be disrupted by the massive posting of "noise" articles, by the repeated posting of identical or near identical articles, by posting followups unrelated to their precursors, or which quote their precursors in full with the addition of minimal extra meterial (especially if this process is iterated), and by crossposting to, or setting followups to, totally unrelated newsgroups. C. H. Lindsey [Page 87] News Article Format April 2001 Many have argued that "spam", massively multiposted (and to a lesser extent massively crossposted) articles, usually for advertising purposes, also constitutes a DoS attack in its own regard. This may be so. Such articles intended to deny service, or other articles of an inflammatory nature, may also have their From or Reply-To addresses set to valid but incorrect email addresses, thus causing large volumes of mail to descend on the true owners of those addresses. It is a violation of this standard for a poster to use as his address a mailbox which he is not entitled to use. Even addresses with an invalid local-part but a valid domain can cause disruption to the administrators of such domains. Posters who wish to remain anonymous or to prevent automated harvesting of their addresses, but who do not care to take the additional precautions of using more sophisticated anonymity measures, should avoid that violation by the use of addresses ending in the ".invalid" top-level-domain (see 5.2). A malicious poster may also prevent his article being seen at a particular site by preloading that site into the Path header (5.6.1) and may thus prevent the true owner of a forged From or Reply-To addresse from ever seeing it. Administrative agencies with responsibility for establishing policies in particular hierarchies can and should set bounds upon the behaviour that is considered acceptable within those hierarchies (for example by promulgating charters for individual newsgroups, and other codes of conduct). Whilst this standard places an onus upon injecting agents to bear responsibility for the misdemeanours of its posters, (which include non-adherence to established policies of the relevant hierarchies as provided in section 8.2), and to provide assistance to the rest of the network by making proper use of the Injector-Info (6.19) and Complaints-To (6.20) headers, it makes no provision for enforcement, which may in consequence be patchy. Nevertheless, injecting sites which persistently fail to honour their respobsibilities or to comply with generally accpted standards of behaviour are likely to find themselves blacklisted, with their articles refused progagation and even subject to cancellation, and other relaying sites would be well advised to withdraw peering arrangements from them. 9.2.2. Compromise of System Integrity The posting of unauthorized (as determined by the policies of the relevant hierarchy) control messages can cause unwanted newsgroups to be created, or wanted ones removed, from serving agents. Administrators of such agents SHOULD therefore take steps to verify the genuiness of such control messages, either by manual inspection (particularly of the Approved header) or by checking any digital signatures that may be provided. In addition, they SHOULD periodically compare the newsgroups carried against any regularly issued checkgroups messages, or against lists maintained by trusted C. H. Lindsey [Page 88] News Article Format April 2001 servers and accessed by out-of-band protocols such as FTP or HTTP. Malicious cancel messages (7.5) can cause valid articles to be removed from serving agents. Administrators of such agents SHOULD therefore take steps to verify that they originated from the poster, the injector or the moderator of the article, or that in other cases they came from a place that is trusted to work within established policies and customs. Articles containing Replaces and/or Supersedes headers (6.15) are effectively cancel messages, and SHOULD be subject to the same checks. Currently, many sites choose to ignore all cancel messages on account of the difficulty of conducting such checks. [But we cannot really say much more until we have Cancel Locks and digital signatures in place.] Improperly configured serving agents can allow articles posted to moderated groups onto the net without first being approved by the moderator. Injecting agents SHOULD verify that moderated articles were was received from one of the entities given in its Approved header and/or check any digital signatures that may be provided. There may be weaknesses in particular implementations that are subject to malicious exploitation. In particular, it has not been unknown for complete shell scripts to be included within Control headers. Implementors need to be aware of this. Reading agents should be chary of acting automatically upon Mime objects with an "application" Content-Type that could change the state of that agent, except in contexts where such applications are specifically expected (see 6.21). Even the Content-Type "text/html" could have unexpected side effects on account of embedded objects, especially embedded executable code or URLs that invoke non-news protocols such as HTTP [RFC 2616]. It is therefore generally recommended that reading agents do not enable the execution of such code (since it is extremely unlikely to have a valid application within Netnews) and that they only honour URLs referring to other parts of the same article. Non-printable characters embedded in article bodies may have surprising effects on printers or terminals, notably by reconfiguring them in undesirable ways which may become apparent only after the reading agent has terminated. 9.3. Liability There is a presumption that a poster who sends an article to Usenet intends it to be stored on a multitude of serving agents, and has therefore given permission for it to be copied to that extent. Nevertheless, Usenet is not exempt from the Copyright laws, and it should not be assumed that permission has been given for the article to be copied outside of Usenet, not for its permanent archiving contrary to any Archive header that may be present. C. H. Lindsey [Page 89] News Article Format April 2001 Posters also need to be aware that they are responsible if they breach Copyright, Libel, Harrassment or other restrictions relating to material that they post, and that they may possibly find themselves liable for such breaches in jurisdictions far from their own. Serving agents may also be liable in some jurisdictions, especially if the breach has been explicitly drawn to their attention. Users who are concerned about such matters should seek advice from competent legal authorities. 10. References [ANSI X3.4] "American National Standard for Information Systems - Coded Character Sets - 7-Bit American National Standard Code for Information Interchange (7-Bit ASCII)", ANSI X3.4, 1986. [ISO 10646] "International Standard - Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane", ISO/IEC 10646-1, 1993. [ISO 3166] "Codes for the representation of names of countries and their subdivisions -- Part 1: Country codes", ISO 3166, 1997. [ISO 8859] International Standard - Information Processing - 8-bit Single-Byte Coded Graphic Character Sets. Part 1: Latin alphabet No. 1, ISO 8859-1, 1987 Part 2: Latin alphabet No. 2, ISO 8859-2, 1987 Part 3: Latin alphabet No. 3, ISO 8859-3, 1988 Part 4: Latin alphabet No. 4, ISO 8859-4, 1988 Part 5: Latin/Cyrillic alphabet, ISO 8859-5, 1988 Part 6: Latin/Arabic alphabet, ISO 8859-6, 1987 Part 7: Latin/Greek alphabet, ISO 8859-7, 1987 Part 8: Latin/Hebrew alphabet, ISO 8859-8, 1988 [MESSFOR] P. Resnick, "Internet Message Format Standard", draft- ietf-drums-msg-fmt-07.txt, March 1998. [NNTP] S. Barber, "Network News Transport Protocol", draft-ietf- nntpext-base-*.txt. [RFC 1034] P. Mockapetris, "Domain Names - Concepts and Facilities", RFC 1034, November 1987. [RFC 1036] M. Horton and R. Adams, "Standard for Interchange of USENET Messages", RFC 1036, December 1987. [RFC 1153] F. Wancho, "Digest Message Format", RFC 1153, April 1990. [RFC 1738] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform Resource Locators (URL)", RFC 1738, December 1994. [RFC 1847] J. Galvin, S. Murphy, S. Crocker, and N. Freed, "Security Multiparts for MIME: Multipart/Signed and Miltipart/Encrypted", C. H. Lindsey [Page 90] News Article Format April 2001 RFC 1847, October 1995. [RFC 2015] M. Elkins, "MIME Security with Pretty Good Privacy (PGP)", RFC 2015, October 1996. [RFC 2044] F. Yergeau, "UTF-8, a transformation format for Unicode and ISO 10646", RFC 2044, October 1996. [RFC 2045] N. Freed and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC 2046] N. Freed and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996. [RFC 2047] K. Moore, "MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text", RFC 2047, November 1996. [RFC 2048] N. Freed, J. Klensin, and J. Postel, "Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures", RFC 2048, November 1996. [RFC 2119] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, March 1997. [RFC 2142] D. Crocker, "Mailbox Names for Common Services, Roles and Functions", RFC 2142, May 1997. [RFC 2234] D. Crocker and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997. [RFC 2279] F. Yergeau, "UTF-8, a transformation format of ISO 10646", RFC 2279, January 1998. [RFC 2373] R. Hinden and S. Deering, "IP Version 6 Addressing Architecture", RFC 2373, July 1998. [RFC 2606] D. Eastlake and A. Panitz, "Reserved Top Level DNS Names", RFC 2606, June 1999. [RFC 2616] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [RFC 820] J. Postel and J. Vernon, "Assigned Numbers", RFC 820, January 1983. [RFC 822] D. Crocker, "Standard for the Format of ARPA Internet Text Messages.", STD 11, RFC 822, August 1982. [RFC 850] Mark R. Horton, "Standard for interchange of Usenet messages", RFC 850, June 1983. C. H. Lindsey [Page 91] News Article Format April 2001 [RFC 976] Mark R. Horton, "UUCP mail interchange format standard", RFC 976, February 1986. [SMTP] John C. Klensin and Dawn P. Mann, "Simple Mail Transfer Protocol", draft-ietf-drums-smtpupd-*.txt. [Son-of-1036] Henry Spencer, "News article format and transmission", , June 1994. [UNICODE] The Unicode Consortium, "The Unicode Standard - Version 2.0", Addison-Wesley, 1996. [USEFOR] Charles H. Lindsey, "News Article Format", draft-ietf- usefor-article-format-03.txt. 11. Acknowledgements [It is intended to insert a list of those who have been prominent contributors to the mailing list of the working group at this point.] 12. Contact Addresses Editor Charles. H. Lindsey 5 Clerewood Avenue Heald Green Cheadle Cheshire SK8 3JU United Kingdom Phone: +44 161 436 6131 Email: chl@clw.cs.man.ac.uk Working group chair David Barr Digital Island Email: barr@visi.com Comments on this draft should preferably be sent to the mailing list of the Usenet Format Working Group at usenet-format@landfield.com. This draft expires six months after the date of publication (see Page 1) (i.e. in October 2001). 13. Intellectual Property Rights [The following are taken from RFC 2026. It is not entirely clear whether all of this is necessary at this stage. Please can someone explain it to me?] C. H. Lindsey [Page 92] News Article Format April 2001 The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Appendix A.1 - A-News Article Format The obsolete "A News" article format consisted of exactly five lines of header information, followed by the body. For example: C. H. Lindsey [Page 93] News Article Format April 2001 Aeagle.642 news.misc cbosgd!mhuxj!mhuxt!eagle!jerry Fri Nov 19 16:14:55 1982 Usenet Etiquette - Please Read body body body The first line consisted of an "A" followed by an article ID (analogous to a message ID and used for similar purposes). The second line was the list of newsgroups. The third line was the path. The fourth was the date, in the format above (all fields fixed width), resembling an Internet date but not quite the same. The fifth was the subject. This format is documented for archeological purposes only. Articles MUST NOT be generated in this format. Appendix A.2 - Early B-News Article Format The obsolete pseudo-Internet article format, used briefly during the transition between the A News format and the modern format, followed the general outline of a MAIL message but with some non-standard headers. For example: From: cbosgd!mhuxj!mhuxt!eagle!jerry (Jerry Schwarz) Newsgroups: news.misc Title: Usenet Etiquette -- Please Read Article-I.D.: eagle.642 Posted: Fri Nov 19 16:14:55 1982 Received: Fri Nov 19 16:59:30 1982 Expires: Mon Jan 1 00:00:00 1990 body body body The From header contained the information now found in the Path header, plus possibly the full name now typically found in the From header. The Title header contained what is now the Subject content. The Posted header contained what is now the Date content. The Article-I.D. header contained an article ID, analogous to a message ID and used for similar purposes. The Newsgroups and Expires headers were approximately as now. The Received header contained the date when the latest relayer to process the article first saw it. All dates were in the above format, with all fields fixed width, resembling an Internet date but not quite the same. This format is documented for archeological purposes only. Articles MUST NOT be generated in this format. C. H. Lindsey [Page 94] News Article Format April 2001 Appendix A.3 - Obsolete Headers Early versions of news software following the modern format sometimes generated headers like the following: Relay-Version: version B 2.10 2/13/83; site cbosgd.UUCP Posting-Version: version B 2.10 2/13/83; site eagle.UUCP Date-Received: Friday, 19-Nov-82 16:59:30 EST Relay-Version contained version information about the relayer that last processed the article. Posting-Version contained version information about the posting agent that posted the article. Date- Received contained the date when the last relayer to process the article first saw it (in a slightly nonstandard format). In addition, this present standard obsoletes certain headers defined in [Son-of-1036] (see 6.22): Also-Control: cancel <9urrt98y53@site.example> See-Also: Article-Names: comp.foo:charter Article-Updates: Also-Control indicated a control message that was also intended to be filed as a normal article. See-Also listed related articles, but without the specfic relationship with followups that pertains to the References header. Article-Names indicated some special significance of that article in relation to the indicated newsgroup. Article- Updates indicated that an earlier article was updated, without at the same time being superseded. These headers are documented for archeological purposes only. Articles containing these headers MUST NOT be generated. Appendix A.4 - Obsolete Control Messages This present standard obsoletes certain control messages defined in [RFC 1036] (see 7.7), all of which had the effect of requesting a description of a relaying or serving agent's software, or its peering arrangements with neighbouring sites, to be emailed to the article's reply address. Whilst of some utility when Usenet was much smaller than it is now, they had become no more than a tool for the malicious sending of mailbombs. Moreover, many organizations now consider information about their internal connectivity to be confidential. version sendsys whogets senduuname "Version" requested details of the transport software in use at a site. "Sendsys" requested the full list of newsgroups taken, and the peering arrangements. "Who gets" was similar, but restricted to a named newsgroup. "Senduuname" resembled "sendsys" but restricted to C. H. Lindsey [Page 95] News Article Format April 2001 the list of peers connected by UUCP. Historically, a checkgroups body consisting of one or two lines, the first of the form "-n newsgroup", caused check-groups to apply to only that single newsgroup. Historically, an article posted to a newsgroup whose name had exactly three components of which the third was "ctl" signified that article was to be taken as a control message. The Subject header specified the actions, in the same way the Control header does now. These forms are documented for archeological purposes only; they MUST NO LONGER be used. Appendix B - Collected Syntax TO BE DONE C. H. Lindsey [Page 96]