Opened 12 years ago

Closed 12 years ago

Last modified 7 years ago

#2216 closed bug (Fixed)

rencode changes str/unicode types

Reported by: Chase Owned by:
Priority: minor Milestone: 2.x
Component: Unknown Version: 1.3.5
Keywords: Cc:

Description

No matter whether a bytestring or unicode object is put into rencode.dumps, when it is loaded it is turned into a bytestring if there were only ascii characters, and a unicode object if there were any non-ascii characters. This causes the problem of having no idea what type of object will be passed into rpc exported methods.

The easiest fix would be to have rencode always return unicode objects for strings, not sure if it's worth it to have it produce the same type of object that was passed in.

Change History (8)

comment:1 by Chase, 12 years ago

Example:

from deluge import rencode
def ident(x):
    print repr(rencode.loads(rencode.dumps(x)))

ident(u'a') # unicode in, str out
ident('a') # str in str out
ident(u'\xf8') # unicode in unicode out
ident(u'\xf8'.encode('utf8')) # str in unicode out

### Result
'a'
'a'
u'\xf8'
u'\xf8'

comment:2 by Chase, 12 years ago

The other concern is the behavior of andar's cython implementation of rencode. Haven't tested that, but maybe it would be easier to fix it at the rpc level rather than rencode level.

comment:3 by Chase, 12 years ago

Simplest possible fix I can think of. Always output unicode objects from rencode. (Unless a bytestring in some other encoding than utf8 was input. Don't do that. Maybe we could even put in an assert for that to catch any possible problems.)

  • deluge/rencode.py

    IDEA additional info:
    Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
    <+>UTF-8
    Subsystem: com.intellij.openapi.diff.impl.patch.BaseRevisionTextPatchEP
     
    160160    colon += 1
    161161    s = x[colon:colon+n]
    162162    try:
    163         t = s.decode("utf8")
    164         if len(t) != len(s):
    165             s = t
     163        s = s.decode("utf8")
    166164    except UnicodeDecodeError:
    167165        pass
    168166    return (s, colon+n)

comment:4 by Chase, 12 years ago

Milestone: 1.3.x1.3.6

comment:5 by Chase, 12 years ago

Milestone: 1.3.61.4.0

comment:6 by Chase, 12 years ago

We need to update rencode to always output strings as utf-8 encoded byte strings. (this is in line with the cython implementation) We should change the rpc code to always decode these before calling the exported methods.

comment:7 by Chase, 12 years ago

Resolution: fixed
Status: newclosed

Fixed. Rpc calls now deliver all string arguments as unicode objects. Relevant commits: 6b5cf3396 2187cef14

comment:8 by Calum, 7 years ago

Milestone: 2.0.x2.x

Milestone renamed

Note: See TracTickets for help on using tickets.