X-Git-Url: http://erislabs.net/gitweb/?a=blobdiff_plain;f=doc%2Fstandards.texi;h=75c154c94ea2bad3e4d0fb2bc186c83de92381c7;hb=0aee6c1867357efc96226bf37b342ba876e5d0f3;hp=6c9532c135274b3eb0371d2786b2ee23d2f80551;hpb=e46439322bc02b44918480e711eb82c4aef5547b;p=gnulib.git diff --git a/doc/standards.texi b/doc/standards.texi index 6c9532c13..75c154c94 100644 --- a/doc/standards.texi +++ b/doc/standards.texi @@ -3,7 +3,7 @@ @setfilename standards.info @settitle GNU Coding Standards @c This date is automagically updated when you save this file: -@set lastupdate November 29, 2004 +@set lastupdate February 8, 2006 @c %**end of header @dircategory GNU organization @@ -33,7 +33,7 @@ The GNU coding standards, last updated @value{lastupdate}. Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, -2001, 2002, 2003, 2004 Free Software Foundation, Inc. +2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 @@ -239,9 +239,9 @@ C'' as a label for the compiler rather than for the language. Please don't use ``win'' as an abbreviation for Microsoft Windows in GNU software or documentation. In hacker terminology, calling -something a "win" is a form of praise. If you wish to praise +something a ``win'' is a form of praise. If you wish to praise Microsoft Windows when speaking on your own, by all means do so, but -not in GNU software. Usually we write the word ``windows'' in full, +not in GNU software. Usually we write the name ``Windows'' in full, but when brevity is very important (as in file names and sometimes symbol names), we abbreviate it to ``w''. For instance, the files and functions in Emacs that deal with Windows start with @samp{w32}. @@ -518,6 +518,8 @@ software. It also describes general standards for error messages, the command line interface, and how libraries should behave. @menu +* Non-GNU Standards:: We consider standards such as POSIX; + we don't "obey" them. * Semantics:: Writing robust programs * Libraries:: Library behavior * Errors:: Formatting error messages @@ -529,6 +531,50 @@ command line interface, and how libraries should behave. * File Usage:: Which files to use, and where @end menu +@node Non-GNU Standards +@section Non-GNU Standards + +The GNU Project regards standards published by other organizations as +suggestions, not orders. We consider those standards, but we do not +``obey'' them. In developing a GNU program, you should implement +an outside standard's specifications when that makes the GNU system +better overall in an objective sense. When it doesn't, you shouldn't. + +In most cases, following published standards is convenient for +users---it means that their programs or scripts will work more +portably. For instance, GCC implements nearly all the features of +Standard C as specified by that standard. C program developers would +be unhappy if it did not. And GNU utilities mostly follow +specifications of POSIX.2; shell script writers and users would be +unhappy if our programs were incompatible. + +But we do not follow either of these specifications rigidly, and there +are specific points on which we decided not to follow them, so as to +make the GNU system better for users. + +For instance, Standard C says that nearly all extensions to C are +prohibited. How silly! GCC implements many extensions, some of which +were later adopted as part of the standard. If you want these +constructs to give an error message as ``required'' by the standard, +you must specify @samp{--pedantic}, which was implemented only so that +we can say ``GCC is a 100% implementation of the standard,'' not +because there is any reason to actually use it. + +POSIX.2 specifies that @samp{df} and @samp{du} must output sizes by +default in units of 512 bytes. What users want is units of 1k, so +that is what we do by default. If you want the ridiculous behavior +``required'' by POSIX, you must set the environment variable +@samp{POSIXLY_CORRECT} (which was originally going to be named +@samp{POSIX_ME_HARDER}). + +GNU utilities also depart from the letter of the POSIX.2 specification +when they support long-named command-line options, and intermixing +options with ordinary arguments. This minor incompatibility with +POSIX is never a problem in practice, and it is very useful. + +In particular, don't reject a new feature, or remove an old one, +merely because a standard says it is ``forbidden'' or ``deprecated.'' + @node Semantics @section Writing Robust Programs @@ -850,7 +896,7 @@ All programs should support two standard options: @samp{--version} and @samp{--help}. CGI programs should accept these as command-line options, and also if given as the @env{PATH_INFO}; for instance, visiting @url{http://example.org/p.cgi/--help} in a browser should -output the same information as inokving @samp{p.cgi --help} from the +output the same information as invoking @samp{p.cgi --help} from the command line. @table @code @@ -1492,9 +1538,7 @@ Used in @code{gawk}. Used in @code{su}. @item machine -No listing of which programs already use this; -someone should check to -see if any actually do, and tell @email{gnu@@gnu.org}. +Used in @code{uname}. @item macro-name @samp{-M} in @code{ptx}. @@ -2144,6 +2188,8 @@ when writing GNU software. * CPU Portability:: Supporting the range of CPU types * System Functions:: Portability and ``standard'' library functions * Internationalization:: Techniques for internationalization +* Character Set:: Use ASCII by default. +* Quote Characters:: Use `...' in the C locale. * Mmap:: How you can safely use @code{mmap}. @end menu @@ -2154,21 +2200,20 @@ when writing GNU software. @cindex open brace @cindex braces, in C source It is important to put the open-brace that starts the body of a C -function in column zero, and avoid putting any other open-brace or -open-parenthesis or open-bracket in column zero. Several tools look -for open-braces in column zero to find the beginnings of C functions. +function in column one, and avoid putting any other open-brace or +open-parenthesis or open-bracket in column one. Several tools look +for open-braces in column one to find the beginnings of C functions. These tools will not work on code not formatted that way. It is also important for function definitions to start the name of the -function in column zero. This helps people to search for function +function in column one. This helps people to search for function definitions, and may also help certain tools recognize them. Thus, -the proper format is this: +using Standard C syntax, the format is this: @example static char * -concat (s1, s2) /* Name starts in column zero here */ - char *s1, *s2; -@{ /* Open brace in column zero here */ +concat (char *s1, char *s2) +@{ @dots{} @} @end example @@ -2179,8 +2224,9 @@ this: @example static char * -concat (char *s1, char *s2) -@{ +concat (s1, s2) /* Name starts in column one here */ + char *s1, *s2; +@{ /* Open brace in column one here */ @dots{} @} @end example @@ -2298,7 +2344,13 @@ page. The formfeeds should appear alone on lines by themselves. @cindex commenting Every program should start with a comment saying briefly what it is for. -Example: @samp{fmt - filter for simple filling of text}. +Example: @samp{fmt - filter for simple filling of text}. This comment +should be at the top of the source file containing the @samp{main} +function of the program. + +Also, please write a brief comment at the start of each source file, +with the file name and a line or two about the overall purpose of the +file. Please write the comments in a GNU program in English, because English is the one language that nearly all programmers in all countries can @@ -2576,7 +2628,7 @@ constants. @cindex file-name limitations @pindex doschk You might want to make sure that none of the file names would conflict -the files were loaded onto an MS-DOS file system which shortens the +if the files were loaded onto an MS-DOS file system which shortens the names. You can use the program @code{doschk} to test for this. Some GNU programs were designed to limit themselves to file names of 14 @@ -2618,11 +2670,11 @@ Avoid using the format of semi-internal data bases (e.g., directories) when there is a higher-level alternative (@code{readdir}). @cindex non-@sc{posix} systems, and portability -As for systems that are not like Unix, such as MSDOS, Windows, the -Macintosh, VMS, and MVS, supporting them is often a lot of work. When -that is the case, it is better to spend your time adding features that -will be useful on GNU and GNU/Linux, rather than on supporting other -incompatible systems. +As for systems that are not like Unix, such as MSDOS, Windows, VMS, MVS, +and older Macintosh systems, supporting them is often a lot of work. +When that is the case, it is better to spend your time adding features +that will be useful on GNU and GNU/Linux, rather than on supporting +other incompatible systems. If you do support Windows, please do not abbreviate it as ``win''. In hacker terminology, calling something a ``win'' is a form of praise. @@ -2667,7 +2719,7 @@ printf ("diff = %ld\n", (long) (pointer2 - pointer1)); @end example 1989 Standard C requires this to work, and we know of only one -counterexample: 64-bit programs on Microsoft Windows IA-64. We will +counterexample: 64-bit programs on Microsoft Windows. We will leave it to those who want to port GNU programs to that environment to figure out how to do it. @@ -2687,37 +2739,50 @@ while ((c = getchar()) != EOF) write(file_descriptor, &c, 1); @end example -When calling functions, you need not worry about the difference between -pointers of various types, or between pointers and integers. On most -machines, there's no difference anyway. As for the few machines where -there is a difference, all of them support Standard C prototypes, so you can -use prototypes (perhaps conditionalized to be active only in Standard C) -to make the code work on those systems. +It used to be ok to not worry about the difference between pointers +and integers when passing arguments to functions. However, on most +modern 64-bit machines pointers are wider than @code{int}. +Conversely, integer types like @code{long long int} and @code{off_t} +are wider than pointers on most modern 32-bit machines. Hence it's +often better nowadays to use prototypes to define functions whose +argument types are not trivial. -In certain cases, it is ok to pass integer and pointer arguments -indiscriminately to the same function, and use no prototype on any -system. For example, many GNU programs have error-reporting functions -that pass their arguments along to @code{printf} and friends: +In particular, if functions accept varying argument counts or types +they should be declared using prototypes containing @samp{...} and +defined using @file{stdarg.h}. For an example of this, please see the +@uref{http://www.gnu.org/software/gnulib/, Gnulib} error module, which +declares and defines the following function: @example -error (s, a1, a2, a3) - char *s; - char *a1, *a2, *a3; -@{ - fprintf (stderr, "error: "); - fprintf (stderr, s, a1, a2, a3); -@} +/* Print a message with `fprintf (stderr, FORMAT, ...)'; + if ERRNUM is nonzero, follow it with ": " and strerror (ERRNUM). + If STATUS is nonzero, terminate the program with `exit (STATUS)'. */ + +void error (int status, int errnum, const char *format, ...); @end example -@noindent -In practice, this works on all machines, since a pointer is generally -the widest possible kind of argument; it is much simpler than any -``correct'' alternative. Be sure @emph{not} to use a prototype for such -functions. +A simple way to use the Gnulib error module is to obtain the two +source files @file{error.c} and @file{error.h} from the Gnulib library +source code repository at +@uref{http://savannah.gnu.org/cgi-bin/viewcvs/gnulib/gnulib/lib/}. +Here's a sample use: + +@example +#include "error.h" +#include +#include + +char *program_name = "myprogram"; -If you have decided to use Standard C, then you can instead define -@code{error} using @file{stdarg.h}, and pass the arguments along to -@code{vfprintf}. +FILE * +xfopen (char const *name) +@{ + FILE *fp = fopen (name, "r"); + if (! fp) + error (1, errno, "cannot read %s", name); + return fp; +@} +@end example @cindex casting pointers to integers Avoid casting pointers to integers if you can. Such casts greatly @@ -2958,6 +3023,63 @@ printf (f->tried_implicit : "# Implicit rule search has not been done.\n"); @end example + +@node Character Set +@section Character Set +@cindex character set +@cindex encodings +@cindex ASCII characters +@cindex non-ASCII characters + +Sticking to the ASCII character set (plain text, 7-bit characters) is +preferred in GNU source code comments, text documents, and other +contexts, unless there is good reason to do something else because of +the application domain. For example, if source code deals with the +French Revolutionary calendar, it is OK if its literal strings contain +accented characters in month names like ``Flor@'eal''. Also, it is OK +to use non-ASCII characters to represent proper names of contributors in +change logs (@pxref{Change Logs}). + +If you need to use non-ASCII characters, you should normally stick with +one encoding, as one cannot in general mix encodings reliably. + + +@node Quote Characters +@section Quote Characters +@cindex quote characters +@cindex locale-specific quote characters +@cindex left quote +@cindex grave accent + +In the C locale, GNU programs should stick to plain ASCII for quotation +characters in messages to users: preferably 0x60 (@samp{`}) for left +quotes and 0x27 (@samp{'}) for right quotes. It is ok, but not +required, to use locale-specific quotes in other locales. + +The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote} and +@code{quotearg} modules provide a reasonably straightforward way to +support locale-specific quote characters, as well as taking care of +other issues, such as quoting a filename that itself contains a quote +character. See the Gnulib documentation for usage details. + +In any case, the documentation for your program should clearly specify +how it does quoting, if different than the preferred method of @samp{`} +and @samp{'}. This is especially important if the output of your +program is ever likely to be parsed by another program. + +Quotation characters are a difficult area in the computing world at +this time: there are no true left or right quote characters in Latin1; +the @samp{`} character we use was standardized there as a grave +accent. Moreover, Latin1 is still not universally usable. + +Unicode contains the unambiguous quote characters required, and its +common encoding UTF-8 is upward compatible with Latin1. However, +Unicode and UTF-8 are not universally well-supported, either. + +This may change over the next few years, and then we will revisit +this. + + @node Mmap @section Mmap @findex mmap @@ -3076,9 +3198,9 @@ functions, variables, options, and important concepts that are part of the program. One combined Index should do for a short manual, but sometimes for a complex package it is better to use multiple indices. The Texinfo manual includes advice on preparing good index entries, see -@ref{Index Entries, , Making Index Entries, texinfo, The GNU Texinfo -Manual}, and see @ref{Indexing Commands, , Defining the Entries of an -Index, texinfo, The GNU Texinfo manual}. +@ref{Index Entries, , Making Index Entries, texinfo, GNU Texinfo}, and +see @ref{Indexing Commands, , Defining the Entries of an +Index, texinfo, GNU Texinfo}. Don't use Unix man pages as a model for how to write GNU documentation; most of them are terse, badly structured, and give inadequate @@ -3587,20 +3709,21 @@ For example, an Athlon-based GNU/Linux system might be The @code{configure} script needs to be able to decode all plausible alternatives for how to describe a machine. Thus, -@samp{athlon-pc-gnu/linux} would be a valid alias. -There is a shell script called -@uref{ftp://ftp.gnu.org/gnu/config/config.sub, @file{config.sub}} -that you can use -as a subroutine to validate system types and canonicalize aliases. +@samp{athlon-pc-gnu/linux} would be a valid alias. There is a shell +script called +@uref{http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.sub, +@file{config.sub}} that you can use as a subroutine to validate system +types and canonicalize aliases. The @code{configure} script should also take the option @option{--build=@var{buildtype}}, which should be equivalent to a plain @var{buildtype} argument. For example, @samp{configure --build=i686-pc-linux-gnu} is equivalent to @samp{configure i686-pc-linux-gnu}. When the build type is not specified by an option -or argument, the @code{configure} script should normally guess it -using the shell script -@uref{ftp://ftp.gnu.org/gnu/config/config.guess, @file{config.guess}}. +or argument, the @code{configure} script should normally guess it using +the shell script +@uref{http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.guess, +@file{config.guess}}. @cindex optional features, configure-time Other options are permitted to specify in more detail the software @@ -3784,9 +3907,11 @@ advertise them to new potential customers. Proprietary software is a social and ethical problem, and the point of GNU is to solve that problem. -The GNU definition of free software is found in -@url{http://www.gnu.org/philosophy/free-sw.html}, with a list of -important licenses and whether they qualify as free in +The GNU definition of free software is found on the GNU web site at +@url{http://www.gnu.org/philosophy/free-sw.html}, and the definition +of free documentation is found at +@url{http://www.gnu.org/philosophy/free-doc.html}. A list of +important licenses and whether they qualify as free is in @url{http://www.gnu.org/licenses/license-list.html}. The terms ``free'' and ``non-free'', used in this document, refer to that definition. If it is not clear whether a license qualifies as free @@ -3843,7 +3968,7 @@ scope of an operating system project. Referring to a web site that describes or recommends a non-free program is in effect promoting that software, so please do not make links (or mention by name) web sites that contain such material. This -policy is relevant particulary for the web pages for a GNU package. +policy is relevant particularly for the web pages for a GNU package. Following links from nearly any web site can lead to non-free software; this is an inescapable aspect of the nature of the web, and