Opened 12 years ago

Closed 11 years ago

Last modified 7 years ago

#2039 closed bug (Fixed)

torrentmanager.py line 1023, in on_alert_tracker_warning - UnicodeDecodeError

Reported by: non7top Owned by:
Priority: minor Milestone: 2.x
Component: Core Version: master
Keywords: Cc:

Description

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/deluge/main.py", line 253, in start_daemon
    Daemon(options, args)
  File "/usr/lib/python2.7/site-packages/deluge/core/daemon.py", line 160, in __init__
    reactor.run()
  File "/usr/lib/python2.7/site-packages/twisted/internet/base.py", line 1162, in run
    self.mainLoop()
  File "/usr/lib/python2.7/site-packages/twisted/internet/base.py", line 1171, in mainLoop
    self.runUntilCurrent()
--- <exception caught here> ---
  File "/usr/lib/python2.7/site-packages/twisted/internet/base.py", line 793, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/usr/lib/python2.7/site-packages/deluge/core/torrentmanager.py", line 1023, in on_alert_tracker_warning
    tracker_status = '%s: %s' % (_("Warning"), str(alert.message()))
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

Looks like a wrong utf conversion

Change History (13)

comment:1 Changed 12 years ago by Cas

  • Component changed from other to core
  • Milestone changed from Future to 1.4.0
  • Version changed from other (please specify) to git master

I am struggling to replicate this but could you try this in the meantime:

diff --git a/deluge/core/torrentmanager.py b/deluge/core/torrentmanager.py
index 996f5c4..1e909d1 100644
--- a/deluge/core/torrentmanager.py
+++ b/deluge/core/torrentmanager.py
@@ -960,7 +960,7 @@ def on_alert_tracker_warning(self, alert):
             torrent = self.torrents[str(alert.handle.info_hash())]
         except:
             return
-        tracker_status = '%s: %s' % (_("Warning"), str(alert.message()))
+        tracker_status = '%s: %s' % (_("Warning"), alert.msg)
         # Set the tracker status for the torrent
         torrent.set_tracker_status(tracker_status)

comment:2 Changed 12 years ago by non7top

I've been running this patch for an hour already and error didn't reappear so far (it was showing up about every 10 minutes before that). SO most probably this resolved it.

In deluged log I see following message, but nothing more then that. I;m also not able to identify the torrent which is causing it.

04:41:54.554 [DEBUG ][deluge.core.torrentmanager :1018] on_alert_tracker_warning

comment:3 Changed 12 years ago by bro

This is not solved in master.

>>> s = 'String with \xe5'
>>> tracker_status = '%s: %s' % (u"Warning", s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 12: ordinal not in range(128)

The tracker status has also created problems when decoding the RPC message with rencode.

comment:4 Changed 12 years ago by andar

Hmm.. I wonder if this is a bug with libtorrent. As far as I understand it, it should only be giving us UTF8 strings, but it may not look at tracker responses and just pass them through which would make sense in this case.

comment:5 Changed 12 years ago by Cas

The solution is to use alert.message().decode("utf8") which I need to commit as part of the fix for #2007.

comment:6 Changed 12 years ago by Cas

Where did \xe5 come from? That is a unicode char but that test is using it in an ascii string so it is bound to fail with UnicodeDecodeError?.

If it just the non-ascii character you are testing then it would either be:

s = u'String with \xe5'
tracker_status = '%s: %s' % (u"Warning", s)

or

s = 'String with \xc3\xa5'
tracker_status = '%s: %s' % (u"Warning", s.decode('utf8'))

The latter being the correct example for what we expect from libtorrent.

comment:7 Changed 12 years ago by andar

It's likely coming from a bad tracker. I talked to hydri and he said that libtorrent passes the trackers response verbatim, so it does no verification if it's infact utf8 or not. Basically, we can't trust that the tracker is following the bencode spec which specifies that all strings should be utf8 encoded.

As I mentioned in irc, we really should be decoding with 'ignore' so that we properly handle non-utf8 byte strings correctly.

comment:8 Changed 12 years ago by bro

I logged the RPC message sent from the daemon with the torrent list, and the dictionary entry looks like this (almost): 'tracker_status': 'site: Error: Message with bad character \xe5 in the middle!',

The string is in fact latin1 (ISO-8859-1), so this works:

>>> s = 'String with \xe5'
>>> s = s.decode("ISO-8859-1")
>>> tracker_status = '%s: %s' % (u"Warning", s)
>>> tracker_status
u'Warning: String with \xe5'

What about first trying to decode for utf8, and if it fails, try ISO-8859-1? If both fail, use decode("utf8", "ignore").

Something like this:

#!/usr/bin/env python

import codecs

error_occured = False

def decode_string(s):
    global error_occured

    if type(s) is unicode:
        return s

    def error_handler(exc):
        """This also avoids pesky prints to terminal by decode when it fails"""
        global error_occured
        error_occured = True
        return (u"", exc.end)
    codecs.register_error("decoding-error-handler", error_handler)

    s2 = s.decode("utf8", "decoding-error-handler")
    if not error_occured:
        return s2

    error_occured = False
    s2 = s.decode("ISO-8859-1", "decoding-error-handler")
    if not error_occured:
        return s2
    return s.decode("utf8", "ignore")

s1 = 'String with \xe5'
s2 = u'String with \xe5'
s3 = 'String with \xc3\xa5'
print "decode s1:", decode_string(s1)
print "decode s2:", decode_string(s2)
print "decode s3:", decode_string(s3)

Ouputs the following:

decode s1: String with å
decode s2: String with å
decode s3: String with å

comment:9 Changed 12 years ago by andar

I think that all we should do is string.decode("utf8", "ignore"). The bencode specification states that strings should be utf8 encoded, so if the tracker is sending us junk then we should simply ignore it. I don't like the idea of adding in all of this extra code because a tracker can't do things properly and removing a bad character from a tracker status message isn't really a big deal.

comment:10 Changed 12 years ago by Cas

There is a decode_string function in common that can be updated to try utf8, then latin1, then utf8 with ignore although it does re-encode as utf8 which I find odd as I would expect this to happen in the utf8_encoded function below (which calls decode_string).

comment:12 Changed 11 years ago by Cas

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in master: 4b99a3977

comment:13 Changed 7 years ago by Cas

  • Milestone changed from 2.0.x to 2.x

Milestone renamed

Note: See TracTickets for help on using tickets.