Opened 4 years ago
Closed 4 years ago
#3440 closed bug (Fixed)
charset should be ignored for application/x-bittorrent
Reported by: | megaksa | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 2.0.4 |
Component: | Core | Version: | master |
Keywords: | Cc: |
Description
deluge version: 2.0.4.dev38 on Arch Linux.
I'm used to use delugesiphon Chrome plugin to add new torrents to server. However after switching to Arch which has latest deluge, the plugin doesn't work anymore for rutracker.org. After the investigation I've found the issue to be deluge itself. httpdownloader. When requesting torrent download rutracker responds with the header:
Content-Type: application/x-bittorrent; charset=Windows-1251
While providing charset for this content type IMO doesn't make sense, I suggest to not do re-encoding to UTF-8 for anything besides 'text/...' MIME types.
Attached is a suggested fix produced by
diff /usr/lib/python3.8/site-packages/deluge/httpdownloader.py /usr/lib/python3.8/site-packages/deluge/httpdownloader_fixed.py
Attachments (1)
Change History (5)
by , 4 years ago
Attachment: | charset_fix.diff added |
---|
comment:2 by , 4 years ago
Milestone: | needs verified → 2.0.4 |
---|
comment:3 by , 4 years ago
Correct. Example of torrent download headers:
Content-Type: application/x-bittorrent; charset=Windows-1251 Content-Disposition: attachment; filename="[rutracker.org].t5778456.torrent"; filename*=UTF-8''%D0%91%D0%B8%D0%B1%D0%BB%D0%B8%D0%BE%D1%82%D0%B5%D0%BA%D0%B0%20%D0%9C%D1%83%D1%80%D0%B7%D0%B8%D0%BB%D0%BA%D0%B8%20-%20%D0%92%D0%B0%D1%80%D0%BC%D1%83%D0%B6%20%D0%92.%20-%20%D0%9C%D0%BE%D1%81%D1%82%D0%BE%D1%80%D0%B3%20%5B1930%2C%20PDF%2C%20RUS%5D%20%5Brutracker-5778456%5D.torrent
AFAIR charset treatment is generally defined for textual MIME types (RFC 6657), i.e. for those with the text/* MIME type. For the rest, the treatment is per specific type documentation. For the binary types it may indicate e.g. an internal encoding (like tags encoding inside an internal binary file representation, particularly inside a torrent file, maybe inside an mp3 file). So generally binary files cannot be re-encoded as textual files can be. I think the right way is to not do the above encoding unless the file is textual. So your proposed solution is less correct, but would also work in my particular case. I'd go with reverse. What are the known types besides text/* where you are interested with content re-encoding?
comment:4 by , 4 years ago
Resolution: | → Fixed |
---|---|
Status: | new → closed |
Yeah I see what you mean and agree we should only be re-encoding text content types. So I have modified your patch, added test and merged to develop: [4d970754a4a]
Thanks for detailed reporting and suggested fix!
We need more a bit information about the exact problem. Are the torrent downloads corrupted decoding with Windows-1251? What is the error you are encountering?
I am wary of changing the way httpdownloader works as it could have unintended consequences but I understand your reasoning for specifying text content type only re-encoding.
I would perhaps propose to just not re-encode application/x-bittorrent (it should be binary...) so in request_callback don't set encoding if content-type is application/x-bittorrent.