Network Working Group M. Nottingham Request for Comments: 5005 September 2007 Category: Standards Track
Feed Paging and Archiving
Status of This Memo
This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.
This specification defines three types of syndicated Web feeds that enable publication of entries across one or more feed documents. This includes "paged" feeds for piecemeal access, "archived" feeds that allow reconstruction of the feed's contents, and feeds that are explicitly "complete".
Syndicated Web feeds (using formats such as Atom ) are often split into multiple documents to save bandwidth, allow "sliding window" access, or for other purposes.
This specification formalizes two types of feeds that can span one or more feed documents; "paged" feeds and "archived" feeds. Additionally, it defines "complete" feeds to cover the case when a single feed document explicitly represents all of the feed's entries.
Each has different properties and trade-offs:
o Complete feeds contain the entire set of entries in one document, and can be useful when it isn't desirable to "remember" previously-seen entries.
o Paged feeds split the entries among multiple temporary documents. This can be useful when entries in the feed are not long-lived or stable, and the client needs to access an arbitrary portion of them, usually in close succession.
o Archived feeds split the entries among multiple permanent documents and can be useful when entries are long-lived, and it is important for clients to see every one.
The semantics of a feed that combines these types is undefined by this specification.
Although they refer to Atom normatively, the mechanisms described herein can be used with similar syndication formats; see Appendix B for one such use.
In this specification, "feed document" refers to an Atom Feed Document or similar syndication instance document. It may contain any number of entries, and may or may not be a complete representation of the logical feed.
A "logical feed" is the complete set of entries associated with a feed (as contrasted with a feed document, which may contain a subset of entries).
"Head section" refers to a document's feed-wide metadata container; e.g., the child elements of the atom:feed element in an Atom Feed Document.
This specification uses terms from the XML Infoset . However, this specification uses a shorthand; the phrase "Information Item" is omitted when naming Element Information Items. Therefore, when this specification uses the term "element," it is referring to an Element Information Item in Infoset terms.
This specification also uses Atom link relations to identify different types of links; see the Atom specification  for information about their syntax, and the IANA link relation registry for more information about specific values.
Note that URI references in link relation values may be relative, and when they are used they must be absolutised, as described in Section 5.1 of .
A complete feed is a feed document that contains all of the entries of a logical feed; any entry not actually in the feed document SHOULD NOT be considered part of that feed.
For example, a feed that represents a ranking that varies over time (such as "Top Twenty Records" or "Most Popular Items") should not have newer entries displayed alongside older ones. By marking this feed as complete, old entries are discarded when it is refreshed.
The fh:complete element, when present in a feed's head section, indicates that the feed document it occurs in is a complete representation of the logical feed's entries. It is an empty element; this specification does not define any content for it.
A paged feed is a set of linked feed documents that together contain the entries of a logical feed, without any guarantees about the stability of each document's contents.
Paged feeds are lossy; that is, it is not possible to guarantee that clients will be able to reconstruct the contents of the logical feed at a particular time. Entries may be added or changed as the pages of the feed are accessed, without the client becoming aware of them.
Therefore, clients SHOULD NOT present paged feeds as coherent or complete, or make assumptions to that effect.
Paged feeds can be useful when the number of entries is very large, infinite, or indeterminate. Clients can "page" through the feed, only accessing a subset of the feed's entries as necessary.
Nottingham Standards Track [Page 4]
RFC 5005 Feed Paging and Archiving September 2007
For example, a search engine might make query results available as a paged feed, so that queries with very large result sets do not overwhelm the server, the network, or the client.
The feed documents in a paged feed are tied together with the following link relations:
o "first" - A URI that refers to the furthest preceding document in a series of documents.
o "last" - A URI that refers to the furthest following document in a series of documents.
o "previous" - A URI that refers to the immediately preceding document in a series of documents.
o "next" - A URI that refers to the immediately following document in a series of documents.
Paged feed documents MUST have at least one of these link relations present, and should contain as many as practical and applicable.
An archived feed is a set of feed documents that can be combined to accurately reconstruct the entries of a logical feed.
Unlike paged feeds, archived feeds enable clients to do this without losing entries. This is achieved by publishing a single subscription document and (potentially) many archive documents.
A subscription document is a feed document that always contains the most recently added or changed entries available in the logical feed.
Archive documents are feed documents that contain less recent entries in the feed. The set of entries contained in an archive document published at a particular URI SHOULD NOT change over time. Likewise, the URI for a particular archive document SHOULD NOT change over time.
The following link relations are used to tie subscription and archived feeds together:
o "prev-archive" - A URI that refers to the immediately preceding archive document.
o "next-archive" - A URI that refers to the immediately following archive document.
o "current" - A URI that, when dereferenced, returns a feed document containing the most recent entries in the feed.
Subscription documents and archive documents MUST have a "prev- archive" link relation, unless there are no preceding archives available. Archive documents SHOULD also have a "next-archive" link relation, unless there are no following archives available.
Archive documents SHOULD indicate their associated subscription documents using the "current" link relation.
Archive documents SHOULD also contain an fh:archive element in their head sections to indicate that they are archives. fh:archive is an empty element; this specification does not define any content for it.
In this example, the feed archives are split into monthly chunks, and the subscription document points to the most recent complete archive "http://example.org/2003/11/index.atom" using the "prev-archive" relation. That document, in turn points to the previous archive "http://example.org/2003/10/index.atom", and so on. Note that the "2003/11" archive does not have a "next-archive" relation, because it is the most recent complete archive; although another archive ("2003/12") may be under construction, it would be an error to link to it before completion.
The requirement that archive documents be stable allows clients to safely assume that if they have retrieved one in the past, it will not meaningfully change in the future. As a result, if an archive document's contents are changed, some clients may not become aware of the changes.
Therefore, if a publisher requires a change to be visible to all users (e.g., correcting factual errors), they should consider publishing the revised entry in the subscription document, in addition to (or instead of) the appropriate archive document. Conversely, unimportant changes (e.g., spelling corrections) might be only effected in archive documents.
Publishers SHOULD construct their feed documents in such a way as to make duplicate removal unambiguous (see Section 4.2).
Publishers are not required to make all archive documents available; they may refuse to serve (e.g., with HTTP status code 403 or 410) or be unable to serve (e.g., with HTTP status code 404) an archive document.
Typically, clients will "subscribe" to an archived feed by polling the subscription document for recent changes. If a URI contained in the prev-archive link relation has not been processed in the past, the client can "catch up" with any missed entries by dereferencing it and adding the contained entries to the logical feed. This process should be repeated recursively until the client encounters a prev- archive link relation that has been processed (the end of the archive is indicated by a missing prev-archive link relation) or an error is encountered.
If duplicate entries are found, clients SHOULD consider only the most recently updated entry to be part of the logical feed. If duplicate entries have the same update time-stamp, or no time-stamps are
Nottingham Standards Track [Page 8]
RFC 5005 Feed Paging and Archiving September 2007
available, the entry sourced from the most recently updated feed document SHOULD replace all other duplicates of that entry.
In Atom-formatted archived feeds, two entries are duplicates if they have the same atom:id element. The update time of an entry is determined by its atom:updated element, and likewise the update time of a feed document is determined by its feed-level atom:updated element.
Clients SHOULD warn users when they are not able to reconstruct the entire logical feed (e.g., by alerting the user that an archive document is unavailable, or displaying pseudo-entries that inform the user that some entries may be missing).
Feeds using this mechanism have the same security considerations as Atom . Encryption and authentication security services can be obtained by encrypting and/or signing the feed, as described in , and may also be obtained through channel-based mechanisms (e.g., TLS , HTTP authentication ) and/or transport (e.g., IPsec ).
Feeds using these mechanisms could be crafted in such a way as to cause a client to initiate excessive (or even an unending sequence of) network requests, causing denial of service (either to the client, the target server, and/or intervening networks). Clients can mitigate this risk by requiring user intervention after a certain number of requests, or by limiting requests either according to a
Nottingham Standards Track [Page 9]
RFC 5005 Feed Paging and Archiving September 2007
hard limit, or with heuristics. Servers can mitigate this risk by denying requests that they consider abusive (e.g., by closing the connection or generating an error).
Clients should be mindful of resource limits when storing feed documents. To reiterate, they are not required to always store or reconstruct the feed when conforming to this specification; they only need to inform the user when the reconstructed feed is not complete.
This specification does not define what it means when a logical feed's component feed documents have different security mechanisms applied.
The author would like to thank the following people for their contributions, comments, and help: Danny Ayers, Thomas Broyer, Lisa Dusseault, Stefan Eissing, David Hall, Bill de Hora, Vidya Narayanan, Aristotle Pagaltzis, John Panzer, Dave Pawson, Garrett Rooney, Robert Sayre, James Snell, Henry Story, and Franklin Tse.
Any errors herein remain the author's, not theirs.
As previously noted, while this specification's extensions are described in terms of the Atom feed format, they are also useful in similar formats. This informative appendix demonstrates how they can be used in an RSS 2.0-formatted  feed.
In RSS 2.0-formatted feeds, two entries are duplicates if they have the same guid element. The update time of an entry is not defined by RSS 2.0, but the feed-level update time can be determined by the lastBuildDate element, if present.
This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.
This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at email@example.com.