Please use a locale that uses the windows 1252 character encoding

Arch Linux

You are not logged in.

#1 2009-07-23 10:42:12

locale and character encoding. What to do about these dreadful ÅÄÖ??

It’s time for me to get it into my head how this works. Please, help me understand before I go nuts.
I’m from Sweden and we use a few of these weird characters like ÅÄÖ.

If I create a file called «övrigt.txt» in windows, then the file will turn up as «?vrigt.txt» on my Linux pc (At least in the console, sometimes it looks ok in other apps in X). The same is true if I create the file in Linux and copy it to Windows, it will look just as weird on the other side.

As I (probably) can’t change the way windows works, my question is what I have to do to have these two systems play nicely with eachother?

This is the output from locale:

Is there anything here I should change? I have tried using ISO-8859-1 with no luck. Mind you that I want to have the system wide language set to english. The only thing I want to achieve is that «Ö» on widows should turn up as «Ö» i Linux as well, and vice versa.

Please save my hair from being torn off, I’m going bald here.

#2 2009-07-26 11:45:59

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

I’ve already gone bald. I have very little hair remaining to tear out.

The problem you have is due to the different character encodings: Windows-1252 and UTF-8. The utilities to change the encodings are iconv, recode and convmv.

iconv is probably already installed. The other two are available in extra. iconv is older and there are many tutorials on the internet. recode was designed to replace iconv. The info pages for recode include a tutorial. convmv is used to translate just the filenames from one encoding to another.

How to get the encoding changes to be automatic is beyond me. You might try a small vfat partition to store the problem files, using the proper codepage:

That might work. Someone probably has an elegant solution; I’m sorry that this is just a kludge.

#3 2009-07-26 12:33:04

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

Hi, this bug report might be relevant for you: http://bugs.archlinux.org/task/7549
It says «gnome-mount» in the title but afaik it applies globally, not only to gnome.

To preserve specials chars, mount with «-o iocharset=utf8» works for me.

Last edited by schuay (2009-07-26 12:34:01)

#4 2009-07-27 18:50:50

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

/etc/locale.gen
and updated locale (there is an executable, read the wiki) ??

If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is.
Simplicity is the ultimate sophistication.

#5 2009-07-27 22:19:50

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

How do you share your files between the systems; Samba, mounted disk, FTP or something else?

PS. Jag är oxå från Sverige, så man kanske kan hjälpa en Svensson i nöd 😉 DS.

Linux is just like an indian tent: no Gates, no Windows and an Apache inside.

#6 2009-07-28 07:11:03

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

Hi antis, if you are talking about USB flash disc then the problem is that some file managers refuse to mount FAT32 file system with UTF-8 encoding since there is not official support for this in kernel. If you want to be able to have correct encoding use a) other file manager (PCManFM works for me) b) make your file manager mount disc with uf8 encoding. At first i tried the second option (http://bbs.archlinux.org/viewtopic.php?id=73804) but i could not get it to work.

#7 2009-07-28 08:15:45

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

Hey, thanks for all the answers!

I share my files in a number of ways, but mainly trough a web application called Ajaxplorer (very nice btw. ). The thing is that as soon as a windows user uploads anything with special chatacters in the file name my programs, xbmc, console etc, refuses to read them correctly. Other ways of sharing is through file copying with usb sticks, ssh etc. It’s really not the way of sharing that is the problem I think, but rather the special characters being used sometimes.

I could probably convert the filenames with suggested applications but then I’ll set the windows users in trouble when they want to download them again, won’t I?

I realize that it’s cp1252 that is the bad guy in this drama. Is there no way to set/use cp1252 as a character encoding in Linux? It’s probably a bad idea as utf8 seems like the future way to go, but the fact that these two OS’s can’t communicate too well in this area is pretty useless if you ask me.

To wrap this up I’ll answer some questions.
@EVRAMP: I’m actually using pcmanfm, but that is only for me and I’m not dealing very often with vfat partitions to be honest.
@pkervien: Well, I think I mentioned my forms of sharing above. (kul med lite arch-svenskar!)
@quarkup: locale.gen is edited and both sv.SE and en_US have utf-8 and ISO-8859 enabled and generated.

. and to clearify things even further. It doesn’t matter if I get or provide a file via a usb stick, samba, ftp or by paper. All I want is for «Ö» to always be «Ö», everywhere.

I can’t believe how hard this is to get around. Linus is finish for crying out loud. I thought he’d sorted this out the first thing he did. Maybe he doesn’t deal with windows or their users at all

Источник

Character with encoding UTF8 has no equivalent in WIN1252

I am getting the following exception:

Is there a way to eradicate such characters, either via SQL or programmatically?
(SQL solution should be preferred).

I was thinking of connecting to the DB using WIN1252, but it will give the same problem.

10 Answers 10

What do you do when you get this message? Do you import a file to Postgres? As devstuff said it is a BOM character. This is a character Windows writes as first to a text file, when it is saved in UTF8 encoding — it is invisible, 0-width character, so you’ll not see it when opening it in a text editor.

Try to open this file in for example Notepad, save-as it in ANSI encoding and add (or replace similar) set client_encoding to ‘WIN1252’ line in your file.

I had a similar issue, and I solved by setting the encoding to UTF8 with \encoding UTF8 in the client before attempting an INSERT INTO foo (SELECT * from bar WHERE x=y); . My client was using WIN1252 encoding but the database was in UTF8, hence the error.

More info is available on the PostgreSQL wiki under Character Set Support (devel docs).

Don’t eridicate the characters, they’re real and used for good reasons. Instead, eridicate Win1252.

I had a very similar issue. I had a linked server from SQL Server to a PostgreSQL database. Some data I had in the table I was selecting from using an openquery statement had some character that didn’t have an equivalent in Win1252. The problem was that the System DSN entry (to be found under the ODBC Data Source Administrator) I had used for the connection was configured to use PostgreSQL ANSI(x64) rather than PostgreSQL Unicode(x64). Creating a new data source with the Unicode support and creating a new modified linked server and refernecing the new linked server in your openquery resolved the issue for me. Happy days.

That looks like the byte sequence 0xBD, 0xBF, 0xEF as a little-endian integer. This is the UTF8-encoded form of the Unicode byte-order-mark (BOM) character 0xFEFF.

I’m not sure what Postgre’s normal behaviour is, but the BOM is normally used only for encoding detection at the beginning of an input stream, and is usually not returned as part of the result.

In any case, your exception is due to this code point not having a mapping in the Win1252 code page. This will occur with most other non-Latin characters too, such as those used in Asian scripts.

Can you change the database encoding to be UTF8 instead of 1252? This will allow your columns to contain almost any character.

Источник

Anything wrong with using windows-1252 instead of UTF-8

I have a test site that has been using windows-1252 all along. They do need/use some symbols like the square root symbol. And they have no need to display in another language other than English. I was recently asked to switch it to UTF-8 because of some security concerns. After I changed it to UTF-8 the square roots and other symbols (which are being pulled out of an Oracle DB and passed through ColdFusion) would appear fine on the resulting web page. However, if I saved the document again (post to DB, page refreshes) the symbols transformed into strange characters. If I saved again even more strange characters would appear. So.

If I don’t need anything other than English is there anything wrong with sticking to windows-1252? Any security/hacking issues?
Are there any implications of NOT using UTF-8 if you are using HTML5 (since that is the default encoding for HTML5)?
If its recommended that I should switch to UTF-8, how do I get the currently stored square root symbols (and other symbols) to work?

I’ve already read all these pages, still having a little trouble grasping it all. Hoping someone here and help clarify for me. Thanks!

https://www.owasp.org/index.php/Canonicalization,_locale_and_Unicode
Excellent description of how UTF-8 came about, why it’s awesome, and the problems it solves…https://www.youtube.com/watch?v=MijmeoH9LT4
http://www.w3.org/International/questions/qa-choosing-encodings “Use UTF-8, if you can”. “In fact the HTML5 specification draft currently says «Authors are encouraged to use UTF-8. Conformance checkers may advise authors against using legacy encodings. Authoring tools should default to using UTF-8 for newly-created documents.»”
http://www.w3schools.com/tags/ref_charactersets.asp “For HTML5, the default character encoding is UTF-8.”
http://www.joelonsoftware.com/articles/Unicode.html

* * * UPDATE * * *

I appreciate all that help so far to make this easier to understand. I’ll simplify the original 3 questions so hopefully a clear answer can be reached, so here it is: The customer doesn’t need support for other languages, they will be using some HTML5 tags and a TON of JSON/XML traffic sent back and forth via jQuery.ajax(). Given that info, from a security standpoint, is there anything wrong with keeping the database set to NLS_CHARACTERSET: WE8MSWIN1252 and the webpages set to ? Thank you.

Here is another question that is a slight spin off from this one: Why am I able to use a character that’s not part of a charset (windows-1252)?.

Источник

Please use a locale that uses the windows 1252 character encoding

Arch Linux

#1 2009-07-23 10:42:12

locale and character encoding. What to do about these dreadful ÅÄÖ??

#2 2009-07-26 11:45:59

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

#3 2009-07-26 12:33:04

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

#4 2009-07-27 18:50:50

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

#5 2009-07-27 22:19:50

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

#6 2009-07-28 07:11:03

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

#7 2009-07-28 08:15:45

Re: locale and character encoding. What to do about these dreadful ÅÄÖ??

Character with encoding UTF8 has no equivalent in WIN1252

10 Answers 10

Anything wrong with using windows-1252 instead of UTF-8

Добавить комментарий Отменить ответ