@setfilename standards.info
@settitle GNU Coding Standards
@c This date is automagically updated when you save this file:
-@set lastupdate June 8, 2005
+@set lastupdate February 8, 2006
@c %**end of header
@dircategory GNU organization
The GNU coding standards, last updated @value{lastupdate}.
Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
-2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
+2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1
command line interface, and how libraries should behave.
@menu
+* Non-GNU Standards:: We consider standards such as POSIX;
+ we don't "obey" them.
* Semantics:: Writing robust programs
* Libraries:: Library behavior
* Errors:: Formatting error messages
* File Usage:: Which files to use, and where
@end menu
+@node Non-GNU Standards
+@section Non-GNU Standards
+
+The GNU Project regards standards published by other organizations as
+suggestions, not orders. We consider those standards, but we do not
+``obey'' them. In developing a GNU program, you should implement
+an outside standard's specifications when that makes the GNU system
+better overall in an objective sense. When it doesn't, you shouldn't.
+
+In most cases, following published standards is convenient for
+users---it means that their programs or scripts will work more
+portably. For instance, GCC implements nearly all the features of
+Standard C as specified by that standard. C program developers would
+be unhappy if it did not. And GNU utilities mostly follow
+specifications of POSIX.2; shell script writers and users would be
+unhappy if our programs were incompatible.
+
+But we do not follow either of these specifications rigidly, and there
+are specific points on which we decided not to follow them, so as to
+make the GNU system better for users.
+
+For instance, Standard C says that nearly all extensions to C are
+prohibited. How silly! GCC implements many extensions, some of which
+were later adopted as part of the standard. If you want these
+constructs to give an error message as ``required'' by the standard,
+you must specify @samp{--pedantic}, which was implemented only so that
+we can say ``GCC is a 100% implementation of the standard,'' not
+because there is any reason to actually use it.
+
+POSIX.2 specifies that @samp{df} and @samp{du} must output sizes by
+default in units of 512 bytes. What users want is units of 1k, so
+that is what we do by default. If you want the ridiculous behavior
+``required'' by POSIX, you must set the environment variable
+@samp{POSIXLY_CORRECT} (which was originally going to be named
+@samp{POSIX_ME_HARDER}).
+
+GNU utilities also depart from the letter of the POSIX.2 specification
+when they support long-named command-line options, and intermixing
+options with ordinary arguments. This minor incompatibility with
+POSIX is never a problem in practice, and it is very useful.
+
+In particular, don't reject a new feature, or remove an old one,
+merely because a standard says it is ``forbidden'' or ``deprecated.''
+
@node Semantics
@section Writing Robust Programs
* CPU Portability:: Supporting the range of CPU types
* System Functions:: Portability and ``standard'' library functions
* Internationalization:: Techniques for internationalization
+* Character Set:: Use ASCII by default.
+* Quote Characters:: Use `...' in the C locale.
* Mmap:: How you can safely use @code{mmap}.
@end menu
@cindex open brace
@cindex braces, in C source
It is important to put the open-brace that starts the body of a C
-function in column zero, and avoid putting any other open-brace or
-open-parenthesis or open-bracket in column zero. Several tools look
-for open-braces in column zero to find the beginnings of C functions.
+function in column one, and avoid putting any other open-brace or
+open-parenthesis or open-bracket in column one. Several tools look
+for open-braces in column one to find the beginnings of C functions.
These tools will not work on code not formatted that way.
It is also important for function definitions to start the name of the
-function in column zero. This helps people to search for function
+function in column one. This helps people to search for function
definitions, and may also help certain tools recognize them. Thus,
using Standard C syntax, the format is this:
@example
static char *
-concat (s1, s2) /* Name starts in column zero here */
+concat (s1, s2) /* Name starts in column one here */
char *s1, *s2;
-@{ /* Open brace in column zero here */
+@{ /* Open brace in column one here */
@dots{}
@}
@end example
@cindex commenting
Every program should start with a comment saying briefly what it is for.
-Example: @samp{fmt - filter for simple filling of text}.
+Example: @samp{fmt - filter for simple filling of text}. This comment
+should be at the top of the source file containing the @samp{main}
+function of the program.
+
+Also, please write a brief comment at the start of each source file,
+with the file name and a line or two about the overall purpose of the
+file.
Please write the comments in a GNU program in English, because English
is the one language that nearly all programmers in all countries can
@cindex file-name limitations
@pindex doschk
You might want to make sure that none of the file names would conflict
-the files were loaded onto an MS-DOS file system which shortens the
+if the files were loaded onto an MS-DOS file system which shortens the
names. You can use the program @code{doschk} to test for this.
Some GNU programs were designed to limit themselves to file names of 14
when there is a higher-level alternative (@code{readdir}).
@cindex non-@sc{posix} systems, and portability
-As for systems that are not like Unix, such as MSDOS, Windows, VMS,
-MVS, and older Macintosh systems, supporting them is often a lot of
-work. When that is the case, it is better to spend your time adding
-features that will be useful on GNU and GNU/Linux, rather than on
-supporting other incompatible systems.
+As for systems that are not like Unix, such as MSDOS, Windows, VMS, MVS,
+and older Macintosh systems, supporting them is often a lot of work.
+When that is the case, it is better to spend your time adding features
+that will be useful on GNU and GNU/Linux, rather than on supporting
+other incompatible systems.
If you do support Windows, please do not abbreviate it as ``win''. In
hacker terminology, calling something a ``win'' is a form of praise.
@end example
1989 Standard C requires this to work, and we know of only one
-counterexample: 64-bit programs on Microsoft Windows IA-64. We will
+counterexample: 64-bit programs on Microsoft Windows. We will
leave it to those who want to port GNU programs to that environment
to figure out how to do it.
: "# Implicit rule search has not been done.\n");
@end example
+
+@node Character Set
+@section Character Set
+@cindex character set
+@cindex encodings
+@cindex ASCII characters
+@cindex non-ASCII characters
+
+Sticking to the ASCII character set (plain text, 7-bit characters) is
+preferred in GNU source code comments, text documents, and other
+contexts, unless there is good reason to do something else because of
+the application domain. For example, if source code deals with the
+French Revolutionary calendar, it is OK if its literal strings contain
+accented characters in month names like ``Flor@'eal''. Also, it is OK
+to use non-ASCII characters to represent proper names of contributors in
+change logs (@pxref{Change Logs}).
+
+If you need to use non-ASCII characters, you should normally stick with
+one encoding, as one cannot in general mix encodings reliably.
+
+
+@node Quote Characters
+@section Quote Characters
+@cindex quote characters
+@cindex locale-specific quote characters
+@cindex left quote
+@cindex grave accent
+
+In the C locale, GNU programs should stick to plain ASCII for quotation
+characters in messages to users: preferably 0x60 (@samp{`}) for left
+quotes and 0x27 (@samp{'}) for right quotes. It is ok, but not
+required, to use locale-specific quotes in other locales.
+
+The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote} and
+@code{quotearg} modules provide a reasonably straightforward way to
+support locale-specific quote characters, as well as taking care of
+other issues, such as quoting a filename that itself contains a quote
+character. See the Gnulib documentation for usage details.
+
+In any case, the documentation for your program should clearly specify
+how it does quoting, if different than the preferred method of @samp{`}
+and @samp{'}. This is especially important if the output of your
+program is ever likely to be parsed by another program.
+
+Quotation characters are a difficult area in the computing world at
+this time: there are no true left or right quote characters in Latin1;
+the @samp{`} character we use was standardized there as a grave
+accent. Moreover, Latin1 is still not universally usable.
+
+Unicode contains the unambiguous quote characters required, and its
+common encoding UTF-8 is upward compatible with Latin1. However,
+Unicode and UTF-8 are not universally well-supported, either.
+
+This may change over the next few years, and then we will revisit
+this.
+
+
@node Mmap
@section Mmap
@findex mmap
The @code{configure} script needs to be able to decode all plausible
alternatives for how to describe a machine. Thus,
-@samp{athlon-pc-gnu/linux} would be a valid alias.
-There is a shell script called
-@uref{ftp://ftp.gnu.org/gnu/config/config.sub, @file{config.sub}}
-that you can use
-as a subroutine to validate system types and canonicalize aliases.
+@samp{athlon-pc-gnu/linux} would be a valid alias. There is a shell
+script called
+@uref{http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.sub,
+@file{config.sub}} that you can use as a subroutine to validate system
+types and canonicalize aliases.
The @code{configure} script should also take the option
@option{--build=@var{buildtype}}, which should be equivalent to a
plain @var{buildtype} argument. For example, @samp{configure
--build=i686-pc-linux-gnu} is equivalent to @samp{configure
i686-pc-linux-gnu}. When the build type is not specified by an option
-or argument, the @code{configure} script should normally guess it
-using the shell script
-@uref{ftp://ftp.gnu.org/gnu/config/config.guess, @file{config.guess}}.
+or argument, the @code{configure} script should normally guess it using
+the shell script
+@uref{http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.guess,
+@file{config.guess}}.
@cindex optional features, configure-time
Other options are permitted to specify in more detail the software
problem.
The GNU definition of free software is found on the GNU web site at
-@url{http://www.gnu.org/philosophy/free-sw.html}. A list of
+@url{http://www.gnu.org/philosophy/free-sw.html}, and the definition
+of free documentation is found at
+@url{http://www.gnu.org/philosophy/free-doc.html}. A list of
important licenses and whether they qualify as free is in
@url{http://www.gnu.org/licenses/license-list.html}. The terms
``free'' and ``non-free'', used in this document, refer to that