Please review for Man-DB changes
dj at linuxfromscratch.org
Sat Oct 25 13:16:52 PDT 2008
Alexander E. Patrakov wrote:
> DJ Lucas wrote:
>> Many other distributions ignore the on disk encodings completely,
>> leaving the end user with a mix of improperly encoded manual pages.
> Well, the end user doesn't care how the manual pages are encoded on
> disk. The only thing that matters is if they are displayed correctly.
> And I can't translate the sentence into Russian, because I don't know
> how an encoding can be ignored by the distribution. Issues can be
> ignored, and encodings can be mishandled.
> And you lost the important bit from your previous mail, that in such
> distributions some pages (that match the de-facto Man setup) are
> readable, while others display as completely "illegible" lines of
> And BTW, Lingvo (the leading online English<->Russian dictionary)
> doesn't even list your intended meaning among the list of available
> translations for "illegible". They think that this word can apply only
> to handwriting or typesetting, and is a synonym for "blurry", or "too
> small to read". I.e., it means something which can be characterized with
> a certain degree of "illegibility", while we are talking about perfectly
> displayed, but wrong characters (and one cannot talk about "more
> correct" or "less correct" characters). So, please choose another word
>> When man encounters an unexpected encoding, it will display the contents
>> as configured, resulting in completely illegible text.
> Man (original) doesn't _know_ the encoding. It just passes the manual
> page through a pipeline designed (deliberately or by copying others'
> setup blindly) to process text in a certain encoding. Garbage in,
> garbage out. Yes, that's essentially what you said, but not all Man
> implementations have enough brains to "expect" some encoding - the
> original Man just pipes text through the static user-configured pipeline.
> Sorry, it is too late here for me to try suggesting a better wording. I
> will do this tomorrow if you don't do it yourself while I sleep.
I qualified the existing text by appending "for their configuration"
>>>> Man-DB uses a
>>>> built-in table (see below) to find the correct serach directory for
>>>> manual pages based on the user's locale settings.
>>> No, it doesn't look into the table in this case. See add_nls_manpath()
>>> in http://www.chiark.greenend.org.uk/~cjwatson/bzr/man-db/trunk/src/manp.c
>>> It iterates over all subdirectories and tests whether the subdirectory
>>> is for the user's language, completely disregarding the encoding.
>> ...ships with manual pages in legacy encodings. Man-DB uses a built-in
>> table (see below) to determine the on disk encoding of the manual pages
>> found for a user's locale. If the directories found do not contain the
>> ".UTF-8" extension, Man-DB checks the table, and performs the necessary
>> conversion. E.g., because of "UTF-8" in the directory name...
> It doesn't work this way. Suppose that the user's locale is
> ll_CC.CODESET. Man looks for subdirectories of /usr/share/man that,
> after removing a possible suffix, reduce to either ll_CC or ll. For each
> of the directories found with a suffix, it uses the suffix as the
> encoding. If the directory has no suffix, Man-DB checks the table.
> "UTF-8" has no special meaning, but your text creates a false impression
> that it does. E.g., if /usr/share/man/ru.CP1251 existed, Man-DB would
> expect to find CP1251-encoded manual pages there. Again, please read the
> source. Oh, you did.
Yes, I reworked it to accurately describe the process.
>> Some interesting reading in the source. Looks like at least
>> unpack_locale_bits() does not care what the codeset is, but it's checked
>> in encodings.c. So:
>> ...If the directories found do not contain an extension, Man-DB checks
>> the table, and performs the necessary conversion. E.g., because of
>> "UTF-8" extension in the directory name...
> It always performs the necessary conversion (e.g., in ru_RU.KOI8-R
> locale, it can use manual pages from /usr/share/man/ru.UTF-8), so let's
> drop or move "and performs the necessary conversion".
Yes. I icked it.
We are now close enough for most, without being misleading or
technically incorrect, and far better than the previous for the current
offering. I'm going to commit now so that others can make adjustments
it if needs be.
-- DJ Lucas
This message has been scanned for viruses and
dangerous content, and is believed to be clean.
More information about the lfs-dev