UTF-8 in {,B}LFS

Bruce Dubbs bdubbs at swbell.net
Wed Oct 19 21:18:09 PDT 2005


In some non-list traffic, there has been some discussion of UTF-8 for
{,B}LFS.  I'm posting this discussion to the wider community for comments.

The question is: should {,B}LFS support UTF-8?  If so, who will be
responsible for the UTF-8 specific portions of the books and how should
the material be presented?

Please reply to the blfs-dev list.

  -- Bruce


>From Alexander Patrakov:

Currently, the (unofficial) UTF-8 LFS book (rightfully) says that even
very basic tasks such as viewing ID3V1 tags, printing plain-text files
from the command line, and recording Windows-readable CDs, are not
possible with the current BLFS if UTF-8 locale is selected and the items
in question contain non-ASCII characters. For some of the issues, a fix
is known. E.g., BEEP Media Player and Kaffeine deal with ID3 tags
properly, plain-text UTF-8 files can be printed with Cedilla (requires
GNU Common Lisp), and a patch exists for cdrtools that allows any
iconv-supported encoding to be used for filename charset. Other problems
can be fixed only by marking a package (e.g. Links) as incompatible with
UTF-8 locales.

>From Randy McMurchy:

Since LFS is not at this point UTF-8 compatible, I'm not sure
that we need to do anything at all for BLFS. However, I'm not
ruling out adding BEEP, or other packages to BLFS.

[Other problems...]
I can only see this happening after LFS has adopted either 1) full
UTF-8 compatibility, or 2) there is a rendered UTF-8 branch supported
by the LFS team. Because right now the only UTF-8 LFS is provided by
your live-CD, where you already include all the BLFS packages you
need, I'm not sure the BLFS book is ready to adopt these changes.

Until one of the two above happens, what might be best is a section
at the beginning of the book, explaining some of these UTF-8 issues.

I would like to get Matt's opinion on the UTF-8 branch factored
into this discussion as well. If LFS doesn't go with a UTF-8
supported version, then I'm not sure that BLFS should either.

Understand I'm not trying to stifle your efforts, I just don't
think that BLFS should support UTF-8 locales if LFS does not.

>From Matthew Burgess:

IMO, it doesn't count as suported unless it is being directly rendered
from the SVN repository also hosted on belgarath, which I'm pretty sure
it isn't.  I'll quite happily set up a UTF-8 branch for the LFS book
though and give anyone who wants/needs commit access.  It's about time I
tried to understand what is required for UTF-8 stuff to be supported,
and I feel really bad for effectively ignoring Alexander's work up until
now.

>From Bruce Dubbs:

My problem is that I cannot relate to non-english locales and would have
no way of testing UTF-8 instructions.  I'm not opposed for BLFS, but I
don't really see a way to make it happen.

In addition, any UTF-8 instructions need to be made optional for those
who don't need to use the capabilities.

>From Alexander Patrakov:

In BLFS, almost all of them will be optional by virtue. Let's see examples :

1) You don't need the new "cedilla" package, thus you don't install it.
2) A note on a2ps package saying "This package doesn't support UTF-8
encoding of text documents. Use Cedilla to print such documents from the
command line". You ignore the note since a2ps works fine for pure ASCII.
3) A note on "links" package page saying "This package doesn't work in
UTF-8 locales, don't install it if you use them". You ignore the note
(because it doesn't apply to your situation) and install the package anyway.
4) A note on "nano" package page saying "Stable versions of nano don't
support UTF-8. Install a development version (1.3.8) using the same
instructions if you need UTF-8 support". You ignore the note and install
nano-1.2.5.
5) A note on Midnight Commander page mentioning an optional UTF-8 patch.
You ignore the patch.

So most of BLFS changes will have absolutely no effect (except for
wasted screen space) on those readers who don't need them.

As for LFS, the situation is different from what you propose for BLFS.
The system resulting from building the UTF-8 book is always UTF-8 ready
(i.e. no optional build steps), but it is the user's option to configure
it to actually use a traditional or UTF-8 based locale. The main changes
in the text amount to:

1) Patches to coreutils, diffutils and grep well-tested by distros and
in fact required for LSB certification.
2) Downgrading groff to 1.18.1.1 + Debian patch. Absolutely needed for
Japanese users. Doesn't harm others. Avoids a language-based "if" in the
LFS book.
3) Replacing Man with Man-db (done in order to avoid "if"s in Man
configuration). Not sure that this will pass, because it brings gdbm as
a dependency. An alternative would be to still install Man but with
"+lang none" and import the whole long "hacks" section from my man-i18n
hint.

Note that the whole man/groff setup is optimized for compatibility with
the current BLFS and for the ability to revert to non-UTF8 setup easily.
If I chose to go with RedHat-like setup instead, BLFS editors would have
to add a couple of optional-but-hard-to-revert "iconv" commands for
every BLFS package that installs non-English manual pages.

4) Building ncurses with wide-character support. Such libraries still
work in traditional 8-bit locales, and cause no build problems with BLFS.
5) Upgrading the "console" script. It is backward-compatible with the
original one, so one could just copy /etc/sysconfig/console from his old
system if UTF-8 is not wanted.
6) A kernel patch needed in order to copy/paste UTF-8 with GPM and for
dead keys to work in UTF-8 locales.



More information about the lfs-dev mailing list