Please review for Man-DB changes

Alexander E. Patrakov patrakov at
Thu Oct 23 22:12:50 PDT 2008

DJ Lucas wrote:

> 6.47.2. Non-English Manual Pages in LFS
> Some packages provide UTF-8 manual pages, which previous versions of
> Man-DB were unable to display correctly because the expected (8-bit)
> encoding for each language was hard-coded in the source of Man-DB.
> Man-DB now uses the extension of the directory name in order to
> determine the encoding of the manual pages stored within. If no
> extension exists, Man-DB uses a built-in table (see below) to
> determine the encoding. E.g., because of "UTF-8" in the directory
> name, it knows that all manual pages residing in
> /usr/share/man/fr.UTF-8 are UTF-8 encoded and, according to the
> built-in table, expects all manual pages residing in
> /usr/share/man/ru to be encoded using KOI8-R.
> Linux distributions have different policies concerning the character
> encoding in which manual pages are stored in the filesystem. E.g.,
> RedHat stores all manual pages in UTF-8, while Debian previously used
> language-specific (mostly 8-bit) encodings. Many other distributions
> simply ignore the problem all together. LFS also used the legacy
> encodings in previuos versions of the book. This was chosen because

typo. And also, the text is misleading: it supports the assumption that 
now legacy encoding are not used.

> of the ease of configuration associated with Man-DB. Additionally,

Readers won't understand this.

> Man-DB provided support for Chinese and Japanese locales, and limited
> support for Korean, whereas Man did not at that time.

Man does support Japanese, by means of the JNROFF directive.

> In contrast, the setup in Fedora Core expects all manual pages to be
> UTF-8 encoded, and stored in directories without suffixes.

Duplicate information. So we need to agree on the examples of directory 
layout that we demonstrate, and their order.

And IMHO, the whole text above (right from the heading) needs to be 
reordered. Something like this:

Some packages provide non-English manual pages. They are displayed 
correctly only if their location and encoding matches the expectation of 
the "man" program. However, different Linux distributions have different 
policies (expressed in the choice of the "man" program, its 
configuration and patches applied to it) concerning the character 
encoding in which manual pages are stored in the filesystem.

E.g., Debian previously required Russian manual pages to be encoded in 
KOI8-R and to be placed in /usr/share/man/ru. Now, in addition, their 
"man" program searches for UTF-8 encoded Russian manual pages in 
/usr/share/man/ru.UTF-8. On the other hand, Fedora stores UTF-8 encoded 
Russian manual pages in /usr/share/man/ru and their "man" program 
doesn't look into /usr/share/man/ru.UTF-8.

Yes, a significant portion of the text has been thrown away.

> Disagreement about the expected encoding of manual pages amongst
> distribution vendors, has led to confusion for upsteam package
> maintainers. Some packages contain, UTF-8 manual pages, while others

No comma after "contain".

> ship with manual pages in legacy encodings.

At this point, we (as I think) have clearly stated the problem for 
upstream maintainers.

After that, we can explain our setup: "Man-DB uses the extension of the 
directory name...", including the examples, even though they duplicate 
our explanation of the modern Debian setup (not all readers know that 
Debian uses Man-DB).

None of the two quotes below should appear in the book.

> Unlike the Man/Groff
> setup in Fedora Core, Man-DB can make very good decisions about the

Only if the user placed the manual pages correctly.

> on disk encoding and present the information to the user in their
> prefered format, without complex configurations.

Man in Fedora Core ships preconfigured, and, due to exclusive use of 
UTF-8, there are no decisions to make. The setup is completely 
transparent to the end users as long as only prepackaged software is used.

Please stop trying to show that the Debian setup is better. The only 
benefits are that it allows for a transition period when UTF-8 and 
legacy manual pages coexist in different directories, and that it 
requires less patches than the approach from RedHat. I take back my 
statement about lengthy configuration, it is invalid with RedHat Groff 
(but valid with the upstream Groff).

> Man-DB has, for the most part, made this problem completely

"this problem" refers to something too far away.

> transparent to end users, as long as the manual pages are installed
> into the correct directory.

Not sure if the two quotes above should appear in the book at all. 
Above, we discussed the problem for upstream maintainers, while "this 
problem" refers to something seen by the end users. Yes, I cheated by 
removing the note about the mess present in most distributions, but I 
don't see where to reinsert it.

> There may be times, however, where one
> encoding is preferred over the other.

Without examples, this is a meaningless phrase. And I think it is not 
the encoding that is preferred, but we prefer one or the other way to 
modify the upstream installation process. To see what I mean, try 
converting MPlayer manual pages to UTF-8 after unpacking the tarball, 
and pretending that this is what upstream provided. You can either 
convert back, or move the installed manual pages (not very clean, but we 
do it for some binaries anyway), or patch the Makefiles. After seeing 
the steps needed to complete each of the ways, you will perhaps be able 
to come up with a better phrase.

<snip the script and the table>

> Following LFS's previous policy, if upstream distributes the manual

Not sure if the reference to our previous setup (not policy, as we 
couldn't change it!) is a good thing.

The rest is OK.

Alexander E. Patrakov

More information about the lfs-dev mailing list