Character sets on the web server

Pascal J.Bourguignon pjb at informatimago.com
Mon Sep 1 23:41:00 PDT 2003


Alexander E. Patrakov writes:
> Pascal J.Bourguignon wrote:
> 
> > 
> > Alexander E. Patrakov writes:
> >> I tried to enforce koi8-r by printing this requirenent (and others) on
> >> paper and distributing this letter, but everyone (including my boss)
> >> violates that and uses cp1251 because Notepad in Windows has no support
> >> for koi8-r and MS Word has no drop-down list to select the character set
> >> of the exported document. I told him to install Aditor, he didn't.
> > 
> > What about converting the files on the server?
> >
> I am afraid that I have no complete understanding of your words. Either you
> mean (1) "Let's store documents only in koi8-r, but automate the process of
> conversion from cp1251", or (2) "Let's store two copies of each document,
> one for humans and one (converted) for htdig".
> 
> Variant (1) has a drwaback that when a user views a page in MSIE and then
> selects "View HTML Source" from the menu, the result in Notepad is
> unreadable.
> 
> Variant (2) is probably impossible since htdig is a bot that makes requests
> to the actual web server.

Variant (3):  Let the user use  whatever encoding they  want, but when
they upload their  edited copy of the document, they do  it in a place
that is not directly used by the web server. Then a daemon takes these
whatever-encoding  copies  and convert  them  to  the official  koi8-r
encoding for the web server and the ht://Dig engine. 

I  could suggest  to use  another web  browser, but  the  users should
either keep their  own "original" copy of the  documents to be further
edited, or  you could set up  another system.  For  example, you could
have these  documents in a  CVS server, and  CVS can be  configured to
process  the documents  on check-in  or check-out  so you  could  do a
different convertion for each user.

Alternatively,  if  you  insist  for  the users  being  able  to  edit
documents fetched from an HTTP server, then you could have two servers
(virtual servers): one www.example.com (or koi8-r.www.example.com) and
one   cp1251.www.example.com.    Then    your   users   could   browse
cp1251.www,example.com and be happy, and  you would have a daemon that
would convert the  encoding from one web virtual  server to the other,
and have you and the public be happy with the standard encoding.

 
> Anyway, at least on other sites people like to insert &#xx; for opening and
> closing "French" quotes, and the codes for them are different in koi8-r and
> cp1251. Also there are some unconvertible characters in both directions
> (like em-dash), so the conversion script must be rather elaborate to find
> these cases.

Use unicode!


> BTW, if I could adapt to having the documents stored in cp1251, there would
> be no problem. The incorrect sorting order in PHP can be fixed by setting
> LC_ALL=ru_RU.cp1251. But still, if I ssh from home (Linux, koi8-r) to the
> server, my local encoding is koi8-r, and fonts expect koi8-r. The same
> applies to everyone using PuTTY.
> 
> Something internal still tells me that it is very wrong to use Linux with
> such windowsish people.

That's the reverse: it's very  wrong to use MS-Windows and MS software
in general.


-- 
__Pascal_Bourguignon__                   http://www.informatimago.com/
----------------------------------------------------------------------
Do not adjust your mind, there is a fault in reality.



More information about the lfs-chat mailing list