Opened 14 years ago

Closed 14 years ago

#1030 closed bug (Invalid)

Creates files in UTF-8 only

Reported by: wRAR Owned by: andar
Priority: major Milestone:
Component: Unknown Version: 1.1.9
Keywords: Cc: wrar@…


As far as I can see, filenames are stored in utf8 internally (strange decision: unicode strings should be used for internal handling) and when it comes to working with on-disk files, they are created in utf-8, no matter what locale and FS encoding is really used.

Change History (3)

comment:1 Changed 14 years ago by andar

I'm not quite sure I understand.. The bittorrent spec requires that file paths are stored as UTF-8 in the metadata, so it makes sense for us to use UTF-8 internally. Also, UTF-8 is an Unicode encoding.

comment:2 Changed 14 years ago by wRAR

  • Cc wrar@… added

File names are stored on the UNIX filesystem as sequences of bytes, using some encoding. When that encoding is the same as the current locale, file name is readable. When it is not - usually it is not. Deluge writes filenames as utf8 strings even when the current locale is not utf8. I couldn't find code where files are created, but because internal representation is utf8 string, I can safely assume that these strings are passed to OS unmodified. If you'll pass an unicode string to open() builtin or to, the file will be created using the current locale encoding automatically. If you'll pass a 8-bit string, the file will be created with the contents of that string as its name. If that 8-bit string contains utf8 data, the file name will contain utf8 data. If the locale is not utf8, that file name will be unreadable.

And when I speak about unicode strings, I mean python built-in 'unicode' type.

And when I suggest using unicode strings as the only internal representation of string data, I not only repeat the general convention of programming using unicode-aware programming languages, I also prepare you for Py3k, whose strings are the same as 'unicode' data type of python 2.x and when that convention is actually enforced.

comment:3 Changed 14 years ago by andar

  • Resolution set to invalid
  • Status changed from new to closed

You are actually talking about how libtorrent behaves, not necessarily Deluge. The whole argument about using Python unicode types is moot since libtorrent handles writing the files to disk. You may wish to file a bug with libtorrent regarding this problem.

Note: See TracTickets for help on using tickets.