Internet Engineering Task Force (IETF) R. Housley Request for Comments: 7760 Vigil Security Category: Informational January 2016 ISSN: 2070-1721
Statement of Work for Extensions to the IETF Datatracker for Author Statistics
Abstract
This is the Statement of Work (SOW) for extensions to the IETF Datatracker to provide statistics about RFCs and Internet-Drafts and their authors.
Status of This Memo
This document is not an Internet Standards Track specification; it is published for informational purposes.
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741.
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7760.
Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
A prominent member of the IETF community has developed a set of tools to produce statistics about the authors of RFCs and Internet-Drafts. These tools analyze the documents themselves to produce statistics about the documents and their authors. The goal of the IETF Datatracker enhancements described in this document is to provide similar statistics and ensure that the software is maintained as part of the IETF information services. While some data may still need to be extracted from the documents themselves, as much data as possible should be maintained in the IETF Datatracker database [DATATRACKER].
Author statistics allow the community to understand where work is being done and by whom. The statistics make it visible which individuals, companies, and geographic regions are the most active contributors. The statistics also show how these are changing over the years.
Housley Informational [Page 2]
RFC 7760 SOW for Author Statistics January 2016
Some of the statistics provide "nice to know" information; however, others are sometimes used to refer to a particular participant's contributions in the IETF or used to study trends within IETF work. For instance, the IETF has been trying to increase the diversity of participants, and the statistics are one way to see the impact of those efforts. Also, the most active individuals are potential candidates for various leadership positions.
The enhancements to the IETF Datatracker shall provide statistics and graphs about documents, document authors, author affiliation, author country, and author continent.
The statistics should also include trends relating to IETF meeting attendees, which the current tools do not track.
For the purposes of these requirements, "recent Internet-Drafts" and "recent RFCs" cover documents that have been published in the last five years.
The statistics shall provide insight into the number of authors per document. The current web page presents the statistics and a bar chart. The current web page can be seen at http://www.arkko.com/tools/rfcstats/authdistr.html.
The statistics shall provide insight into the size of the documents. The current web page presents the statistics and a bar chart. The current web page can be seen at http://www.arkko.com/tools/allstats/pagedistr.html. With the planned change in document format, some other way to measure document size might be more appropriate, such as word count.
Additionally, statistics about the document format that was used by the authors should be provided, which is not provided by the current tools.
The statistics shall provide insight into the use of various specification techniques such as ABNF, ASN.1, C code, CBOR, JSON, and XML. The current web page does not include all of these techniques. The current web page can be seen at http://www.arkko.com/tools/allstats/formatdistr.html.
The statistics shall provide insight into the relative impact of authors by the number of their RFCs that are cited by other RFCs. The current web page can be seen at http://www.arkko.com/tools/rfcstats/hindextop.html.
The statistics shall provide insight into countries represented by attendees at each IETF meeting. Country-based statistics have been presented in the plenary session for many years. For consistency with the author statistics discussed in Section 3 of this document, the statistics will include a way to show the EU as a single "country" for the sake of comparison with other large countries. The statistics for each meeting should be accompanied with a pie chart that shows the eight countries with the highest number of attendees and "other".
The statistics shall illustrate the change in number of IETF meeting attendees per country over time. Again, for consistency with the author statistics discussed in Section 3 of this document, the statistics will include a way to show the EU as a single "country".
Since the new code will be driven by the Datatracker database to the greatest extent possible, the existing code may be of limited value. The existing code was intended as a temporary solution and requires a rewrite. However, a set of heuristics used by the code may be useful. These heuristics are provided in a separate rule database and are used as a last resort when there is otherwise too little information. The heuristics include author aliases, some recognized authors and some recognized affiliations, domain name data for determining location and affiliation, and mappings for some ways that people represent their countries in a post address.
Authors are not consistent about the way their names appear in various documents. For example, one document may include their given name and another document may include a nickname. The Datatracker
Housley Informational [Page 6]
RFC 7760 SOW for Author Statistics January 2016
database provides a way to capture aliases, but not all of the aliases in the documents have been added to the database.
The current Datatracker database does not have tables for heuristics other than author aliases that are used in the current tool. Appropriate tables to hold the additional heuristics from the current rule database should be added to the Datatracker database in a manner agreed by the group of people that maintain the Datatracker source code.
A workable web interface, possibly using Django Admin, to update the new heuristics tables shall be provided.
The software is split in two parts, with the code itself being separate from the heuristics database. The two main components of the code are authorstats, which produces the statistics and generates the statistics web pages, and getauthors, which performs document analysis.
The current tools analyze the documents themselves to produce statistics. Some of the data needed to produce the statistics is not currently in the Datatracker database. This development effort will include adding the capability to capture this data in the Datatracker database and populate it for all RFCs and Internet-Drafts posted over the last five years. It may be cost-effective to leverage the existing code to extract the information and then verify it one time.
The URLs for the current tools exist in many places in the Web. Once a suitable replacement tool is available, the author of the original tools has promised to provide a suitable form of redirection.
With the planned change in document format, some of the information obtained through heuristics might be more directly extracted from the XML file. Once this format is being used by a significant number of authors, a future effort might move away from heuristics toward extraction from the XML file. To support this approach, it would be desirable for the new XML schema to make the author's continent of residence available, even if it not used in the formatting of the human-readable document.
Housley Informational [Page 7]
RFC 7760 SOW for Author Statistics January 2016
Currently, the Datatracker has information about document authors, but not other contributors. If information is added to the Datatracker in the future to cover contributors, then the statistics can be expanded to cover contributors as well as authors.
This document contains the SOW for enhancements to the IETF Datatracker to provide author statistics. These enhancements do not affect the security of the Internet. The enhancements provide statistics about documents that are available to the public without prior authentication, and the statistics will also be available to the public without prior authentication.