Opened 3 years ago

Closed 3 years ago

#3440 closed bug (Fixed)

charset should be ignored for application/x-bittorrent

Reported by: megaksa Owned by:
Priority: major Milestone: 2.0.4
Component: Core Version: master
Keywords: Cc:

Description

deluge version: 2.0.4.dev38 on Arch Linux.

I'm used to use delugesiphon Chrome plugin to add new torrents to server. However after switching to Arch which has latest deluge, the plugin doesn't work anymore for rutracker.org. After the investigation I've found the issue to be deluge itself. httpdownloader. When requesting torrent download rutracker responds with the header: Content-Type: application/x-bittorrent; charset=Windows-1251

While providing charset for this content type IMO doesn't make sense, I suggest to not do re-encoding to UTF-8 for anything besides 'text/...' MIME types.

Attached is a suggested fix produced by diff /usr/lib/python3.8/site-packages/deluge/httpdownloader.py /usr/lib/python3.8/site-packages/deluge/httpdownloader_fixed.py

Attachments (1)

charset_fix.diff (839 bytes) - added by megaksa 3 years ago.

Download all attachments as: .zip

Change History (5)

Changed 3 years ago by megaksa

comment:1 Changed 3 years ago by Cas

We need more a bit information about the exact problem. Are the torrent downloads in UTF8 and decoding with Windows-1251 is corrupting the data? What is the error you are encountering?

I am wary of changing the way httpdownloader works as it could have unintended consequences.

I would propose to not re-encode application/x-bittorrent (it should be utf8...) so in request_callback don't set encoding if content-type is application/x-bittorrent.

if "application/x-bittorrent" not in content_type:
    encoding = charset
Version 0, edited 3 years ago by Cas (next)

comment:2 Changed 3 years ago by Cas

  • Milestone changed from needs verified to 2.0.4

comment:3 Changed 3 years ago by megaksa

Correct. Example of torrent download headers:

Content-Type: application/x-bittorrent; charset=Windows-1251
Content-Disposition: attachment; filename="[rutracker.org].t5778456.torrent"; filename*=UTF-8''%D0%91%D0%B8%D0%B1%D0%BB%D0%B8%D0%BE%D1%82%D0%B5%D0%BA%D0%B0%20%D0%9C%D1%83%D1%80%D0%B7%D0%B8%D0%BB%D0%BA%D0%B8%20-%20%D0%92%D0%B0%D1%80%D0%BC%D1%83%D0%B6%20%D0%92.%20-%20%D0%9C%D0%BE%D1%81%D1%82%D0%BE%D1%80%D0%B3%20%5B1930%2C%20PDF%2C%20RUS%5D%20%5Brutracker-5778456%5D.torrent

AFAIR charset treatment is generally defined for textual MIME types (RFC 6657), i.e. for those with the text/* MIME type. For the rest, the treatment is per specific type documentation. For the binary types it may indicate e.g. an internal encoding (like tags encoding inside an internal binary file representation, particularly inside a torrent file, maybe inside an mp3 file). So generally binary files cannot be re-encoded as textual files can be. I think the right way is to not do the above encoding unless the file is textual. So your proposed solution is less correct, but would also work in my particular case. I'd go with reverse. What are the known types besides text/* where you are interested with content re-encoding?

comment:4 Changed 3 years ago by Cas

  • Resolution set to Fixed
  • Status changed from new to closed

Yeah I see what you mean and agree we should only be re-encoding text content types. So I have modified your patch, added test and merged to develop: [4d970754a4a]

Thanks for detailed reporting and suggested fix!

Note: See TracTickets for help on using tickets.