Opened 4 years ago

Closed 4 years ago

#3440 closed bug (Fixed)

charset should be ignored for application/x-bittorrent

Reported by: megaksa Owned by:
Priority: major Milestone: 2.0.4
Component: Core Version: master
Keywords: Cc:

Description

deluge version: 2.0.4.dev38 on Arch Linux.

I'm used to use delugesiphon Chrome plugin to add new torrents to server. However after switching to Arch which has latest deluge, the plugin doesn't work anymore for rutracker.org. After the investigation I've found the issue to be deluge itself. httpdownloader. When requesting torrent download rutracker responds with the header: Content-Type: application/x-bittorrent; charset=Windows-1251

While providing charset for this content type IMO doesn't make sense, I suggest to not do re-encoding to UTF-8 for anything besides 'text/...' MIME types.

Attached is a suggested fix produced by diff /usr/lib/python3.8/site-packages/deluge/httpdownloader.py /usr/lib/python3.8/site-packages/deluge/httpdownloader_fixed.py

Attachments (1)

charset_fix.diff (839 bytes ) - added by megaksa 4 years ago.

Download all attachments as: .zip

Change History (5)

by megaksa, 4 years ago

Attachment: charset_fix.diff added

comment:1 by Calum, 4 years ago

We need more a bit information about the exact problem. Are the torrent downloads corrupted decoding with Windows-1251? What is the error you are encountering?

I am wary of changing the way httpdownloader works as it could have unintended consequences but I understand your reasoning for specifying text content type only re-encoding.

I would perhaps propose to just not re-encode application/x-bittorrent (it should be binary...) so in request_callback don't set encoding if content-type is application/x-bittorrent.

if "application/x-bittorrent" not in content_type:
    encoding = charset
Last edited 4 years ago by Calum (previous) (diff)

comment:2 by Calum, 4 years ago

Milestone: needs verified2.0.4

comment:3 by megaksa, 4 years ago

Correct. Example of torrent download headers:

Content-Type: application/x-bittorrent; charset=Windows-1251
Content-Disposition: attachment; filename="[rutracker.org].t5778456.torrent"; filename*=UTF-8''%D0%91%D0%B8%D0%B1%D0%BB%D0%B8%D0%BE%D1%82%D0%B5%D0%BA%D0%B0%20%D0%9C%D1%83%D1%80%D0%B7%D0%B8%D0%BB%D0%BA%D0%B8%20-%20%D0%92%D0%B0%D1%80%D0%BC%D1%83%D0%B6%20%D0%92.%20-%20%D0%9C%D0%BE%D1%81%D1%82%D0%BE%D1%80%D0%B3%20%5B1930%2C%20PDF%2C%20RUS%5D%20%5Brutracker-5778456%5D.torrent

AFAIR charset treatment is generally defined for textual MIME types (RFC 6657), i.e. for those with the text/* MIME type. For the rest, the treatment is per specific type documentation. For the binary types it may indicate e.g. an internal encoding (like tags encoding inside an internal binary file representation, particularly inside a torrent file, maybe inside an mp3 file). So generally binary files cannot be re-encoded as textual files can be. I think the right way is to not do the above encoding unless the file is textual. So your proposed solution is less correct, but would also work in my particular case. I'd go with reverse. What are the known types besides text/* where you are interested with content re-encoding?

comment:4 by Calum, 4 years ago

Resolution: Fixed
Status: newclosed

Yeah I see what you mean and agree we should only be re-encoding text content types. So I have modified your patch, added test and merged to develop: [4d970754a4a]

Thanks for detailed reporting and suggested fix!

Note: See TracTickets for help on using tickets.