Please review for Man-DB changes
Alexander E. Patrakov
patrakov at gmail.com
Wed Oct 22 21:35:01 PDT 2008
DJ Lucas wrote:
> Guys, I'm obviously lacking creativity tonight. ;-) I've posted a
> local copy of the book in my home dir on quantum. I would like
> someone else (or many somebody elses) to review the textual changes
> on the man-db page for both technical and grammatical errors.
> Thanks in advance.
> Some packages provide UTF-8 man pages, which previous versions of
> Man-DB were unable to display. This limitation has been overcome in
> recent versions, and Man-DB can now convert man pages from legacy
> 8-bit encodings to UTF-8 (and vice-versa) on the fly.
I don't like the wording here. We need to mention two features separately:
1) conversion TO arbitrary encoding on the fly (was present in old
versions of Man-DB, too, but is just a distracting factor here);
2) expectations about the input (changed, was hard-coded, now, in
addition, looks into the extension of the directory).
Better, but IMHO still not acceptable for anything except -dev book:
Some packages provide UTF-8 man pages, which previous versions of Man-DB
were unable to display correctly, because the expected (8-bit) encoding
for each language was hard-coded in the source of Man-DB. Now Man-DB
uses the extension of the directory name in order to determine the
encoding of the manual pages stored there, and uses the built-in table
only if the encoding is not speciried in the directory name. E.g.,
because of "UTF-8" in the directory name, it knows that all manual pages
residing in /usr/share/man/fr.UTF-8 are UTF-8 encoded and, according to
the built-in table, expects all manual pages residing in
/usr/share/man/ru to be in KOI8-R.
On the other hand, the setup in Fedora Core expected all manual pages to
be UTF-8 encoded and stored in directories without suffixes ".UTF-8".
Bruce: could you please try to criticise or shorten this?
> This used to be
"This" => "Disagreement about the expected encoding of manual pages".
> a rather annoying problem across different distributions, as packages
> written for one distribution would require changes to work on
> This script was written, and included in LFS to overcome
> this problem. The script will allow you to pass an in and out value
> to convert man pages to and from legacy 8-bit and UTF-8 encodings.
Technically, we don't need it. But it is still abused in BLFS to convert
Midnight Commander hints after patching. We definitely don't need the
script so close to the beginning of the page, I propose to move it to
the "Non-English Manual Pages in LFS" section.
> 6.47.2. Non-English Manual Pages in LFS
> Linux distributions have different policies concerning the character
> encoding in which manual pages are stored in the filesystem. E.g.,
> RedHat stores all manual pages in UTF-8, while Debian previously used
and still uses predominantly
> language-specific (mostly 8-bit) encodings. As mentioned above, this
> leads to incompatibility of packages with manual pages designed for
> different distributions.
> LFS previously used the same convention as Debian. This was chosen
> because Man-DB did not understand man pages stored in UTF-8 at the
> time of its introduction into LFS. For our purposes at that time,
> Man-DB was preferable to Man as it worked without any additional
> configuration in any locale.
> This is still true today as Man-DB with
> Debian patched Groff will now properly convert UTF-8 encoded man
> pages to the user's locale on the fly.
Only if they are placed correctly.
> Additionally, this combination
> provides support for Chinese and Japanese locales, and limited
> support for Korean, whereas Man does not.
Wrong. Man does work (if we ignore translations of error messages) with
the same languages if used together with Debian-patched groff. The only
difference is that Man has the pipeline constructed in the configuration
file by the user, while Man-DB constructs the pipeline programmatically
by applying knowledge about the expected input and output encoding of
various programs. Obviously, a user can write the same pipeline into Man
configuration file, but this would take several pages to explain.
> The current offering of Man
> as used in RedHat requires major modifications to both the Man and
> Groff packages,
> and still falls short on Chinese, Japanese, and
> Korean encodings.
> Finally, it should be noted that most distributions, including
> Debian, are rapidly migrating to all UTF-8 encoded man pages.
Wrong. Most distributions (including Gentoo and Arch) completely ignore
the problem, present to the user the unreadable mix of 8-bit and UTF-8
pages in the same directory, and are thus broken.
The leading and government-sponsored Russian distribution (Alt Linux)
still uses 8-bit (KOI8-R) manual pages. The only distributions that
converted fully are RedHat derivatives. Debian only starts to get ready.
> Upstream packagers will very likely drop legacy encodings in favor of
> UTF-8, though adoption has been slow due to the hacks required to
> make the current Man and Groff packages work correctly together.
I don't know how to comment on this. Modern desktop packages come with
DocBook documentation, not manual pages.
> The relationship between language codes and the expected encoding of
> legacy manual pages is listed below.
> Table 6.1. <snip>
Up to this point, nothing is said (except in the text I proposed at the
very top of my post) HOW Man-DB determines the encoding of a manual
page. Theory should be given before examples, not in examples. This
worked before, because the whole theory was expressed in the table.
> If upstream distributes the manual pages in a legacy encoding the
> manual pages can simply be copied to /usr/share/man/<language code>.
> For example, German manual pages can be installed with the following
> mkdir -p /usr/share/man/de cp -rv man? /usr/share/man/de
> If upstream distributes manual pages in UTF-8 (i.e., “for RedHat”)
> instead of the encoding listed in the table above, they can either be
> converted from UTF-8 to the encoding listed in the table above, or
> they can be installed directly into /usr/share/man/<language
OK. Here the script would go. Also I'd like to see comparison of both
approaches. E.g., if the manual pages are installed with a Makefile, it
is often easier to convert manual pages before installation than to
patch the Makefile.
> For example, to install Spanish manual pages
Let's drop this buggy package and explain both techniques with French
Alexander E. Patrakov
More information about the lfs-dev