Opened 8 years ago

Closed 8 years ago

Last modified 4 years ago

#2216 closed bug (Fixed)

rencode changes str/unicode types

Reported by: gazpachoking Owned by:
Priority: minor Milestone: 2.x
Component: Unknown Version: 1.3.5
Keywords: Cc:

Description

No matter whether a bytestring or unicode object is put into rencode.dumps, when it is loaded it is turned into a bytestring if there were only ascii characters, and a unicode object if there were any non-ascii characters. This causes the problem of having no idea what type of object will be passed into rpc exported methods.

The easiest fix would be to have rencode always return unicode objects for strings, not sure if it's worth it to have it produce the same type of object that was passed in.

Change History (8)

comment:1 Changed 8 years ago by gazpachoking

Example:

from deluge import rencode
def ident(x):
    print repr(rencode.loads(rencode.dumps(x)))

ident(u'a') # unicode in, str out
ident('a') # str in str out
ident(u'\xf8') # unicode in unicode out
ident(u'\xf8'.encode('utf8')) # str in unicode out

### Result
'a'
'a'
u'\xf8'
u'\xf8'

comment:2 Changed 8 years ago by gazpachoking

The other concern is the behavior of andar's cython implementation of rencode. Haven't tested that, but maybe it would be easier to fix it at the rpc level rather than rencode level.

comment:3 Changed 8 years ago by gazpachoking

Simplest possible fix I can think of. Always output unicode objects from rencode. (Unless a bytestring in some other encoding than utf8 was input. Don't do that. Maybe we could even put in an assert for that to catch any possible problems.)

  • deluge/rencode.py

    IDEA additional info:
    Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
    <+>UTF-8
    Subsystem: com.intellij.openapi.diff.impl.patch.BaseRevisionTextPatchEP
     
    160160    colon += 1 
    161161    s = x[colon:colon+n] 
    162162    try: 
    163         t = s.decode("utf8") 
    164         if len(t) != len(s): 
    165             s = t 
     163        s = s.decode("utf8") 
    166164    except UnicodeDecodeError: 
    167165        pass 
    168166    return (s, colon+n) 

comment:4 Changed 8 years ago by gazpachoking

  • Milestone changed from 1.3.x to 1.3.6

comment:5 Changed 8 years ago by gazpachoking

  • Milestone changed from 1.3.6 to 1.4.0

comment:6 Changed 8 years ago by gazpachoking

We need to update rencode to always output strings as utf-8 encoded byte strings. (this is in line with the cython implementation) We should change the rpc code to always decode these before calling the exported methods.

comment:7 Changed 8 years ago by gazpachoking

  • Resolution set to fixed
  • Status changed from new to closed

Fixed. Rpc calls now deliver all string arguments as unicode objects. Relevant commits: 6b5cf3396 2187cef14

comment:8 Changed 4 years ago by Cas

  • Milestone changed from 2.0.x to 2.x

Milestone renamed

Note: See TracTickets for help on using tickets.