Jump to content

Unicode input to python script


Recommended Posts

I'm trying to pass in a character like γ into a python script through alfred, (as well as use it as a dict key within that script). 

 

I keep getting the error 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128).

 

Does anyone know what this is about, or how I can achieve unicode character input into a python script?

Link to comment

I know the topic backwards, forwards and sideways, but I'm not psychic and can't debug code I haven't seen.

 

http://www.deanishe.net/alfred-workflow/user-manual/text-encoding.html

 

It's very general, but that's the best you're going to get without posting your code that's causing the issue.

 

You might even be using Python 3 for all I know. Which would change all the answers compared to Python 2.

Edited by deanishe
Link to comment

I know the topic backwards, forwards and sideways, but I'm not psychic and can't debug code I haven't seen.

 

http://www.deanishe.net/alfred-workflow/user-manual/text-encoding.html

 

It's very general, but that's the best you're going to get without posting your code that's causing the issue.

 

You might even be using Python 3 for all I know. Which would change all the answers compared to Python 2.

 

 

Thanks for the reply. I'm using python 2, very rough/working code is below. I'm working on a calculator/expression parser for alfred that uses sympy's parse_expr. I'm passing in arguments with a bash script as in 'python myscript.py "{query}"'. The dictionary 'var_dict' that holds the unicode characters in the script is for defining custom vars in sympy, so if I pass in 'γ' to alfred, (then going through "{query}" in the previous manner), it would recognize that as 1.4. This works when i pass in ascii chars, but not unicode ones.

 

I probably don't need docopt here, but I've found it fun to use.

'''calculate.py [args]
Usage:
    calculate.py <query>

Options:
    -h'''

from sympy.parsing.sympy_parser import (parse_expr,standard_transformations,
                                        convert_xor,implicit_multiplication_application,
                                        split_symbols_custom,_token_splittable,TokenError)
import sympy
from sympy import N,SympifyError
from workflow import Workflow
import sys
import re
#reload(sys)
#sys.setdefaultencoding('UTF8')

# sympy.cosd = lambda x : sympy.cos( sympy.mpmath.radians(x) )
# sympy.sind = lambda x : sympy.sin( sympy.mpmath.radians(x) )
sympy.cosd = lambda x : sympy.cos( sympy.mpmath.radians(x) )
sympy.sind = lambda x : sympy.sin( sympy.mpmath.radians(x) )
sympy.tand = lambda x : sympy.tan( sympy.mpmath.radians(x) )

var_dict={u'R':287,u'gamma':1.4,u'gammae':1.3,u'γ':1.4,u'g':9.81}

def can_split(symbol):
    if symbol not in (var_dict.keys()):
        return _token_splittable(symbol)
    return False

transformation=split_symbols_custom(can_split)
transformations = (standard_transformations +(transformation,convert_xor,implicit_multiplication_application))

def main(wf):
    from docopt import docopt
    args = docopt(__doc__,wf.args)
    query=args.get('<query>').decode('UTF-8')

    with open('history.txt') as historyFile:
         historyList=historyFile.read().splitlines()
        
    history=[tuple(x.split(',')) for x in historyList[::-1]]
    
    if 'v:' in query:
        
        quer=re.compile(query.split('v:')[1],re.IGNORECASE)
        ordered=sorted(var_dict.keys(), key=lambda s: s.lower())
        for i in ordered:
            if quer.search(i) or quer.search(str(var_dict[i])):
                wf.add_item(i,
                            unicode(var_dict[i]),
                            autocomplete=query.split('v:')[0]+i)
        wf.send_feedback()
        return 0
    elif 'h:' in query:
        quer=re.compile(query.split('h:')[1],re.IGNORECASE)
        for i in history:
            if quer.search(i[0]) or quer.search(i[1]):
                wf.add_item(i[0],
                            i[1],
                            icon='history.png',
                            autocomplete=query.split('h:')[0]+i[0])
        wf.send_feedback()
        return 0
    
    try:
        parsed=parse_expr(query,local_dict=var_dict,transformations=transformations)
        result=unicode(N(parsed).round(10))
    
    except TypeError:
        try:
            parsed=parse_expr(query,local_dict=var_dict,transformations=transformations)
            result=unicode(N(parsed))
        except TypeError:
            result=u'...'

    except (TokenError,SyntaxError,SympifyError):
        result=u'...'
    
        #parsed=query
        
    result=result.replace('**','^')
    #parsed=(unicode(parsed).replace('**','^') if unicode(parsed)[0:2]!='0-' else unicode(parsed)[1:].replace('**','^'))
    query=(unicode(query) if unicode(query)!='0-' else unicode(query[1:]))
    wf.add_item(result,
                query,
                arg=result+','+query,
                icon='rightarrow.png',
                valid=True,
                largetext=result)

    
    for i in history:
        wf.add_item(i[0],
                    i[1],
                    autocomplete=i[1],
                    icon='history.png')
    wf.send_feedback()

if __name__==u"__main__":
    wf=Workflow()
    sys.exit(wf.run(main))

Edited by therockmandolinist
Link to comment

Can you edit your post to indent the code properly? It's Python, so indentation matters and the code as you've posted it is invalid.

 

Also, you're getting a line number in your traceback, so could you post it?

 

On top of that, the input (if there is any), the expected output and the actual output.

 

If I can't replicate the issue, it's usually almost impossible to fix.

Link to comment

A couple of things that I can discern from the unindented code:

  • Calling docopt with wf.args can lead to issues (though mostly when multiple arguments are permitted, IIRC). Docopt expects the raw, encoded-string arguments, not the Unicode objects in wf.args. That shouldn't be a problem here, but you're trying to decode a Unicode object with query=args.get('<query>').decode('UTF-8'). That's incorrect if you're passing wf.args to docopt. args.get('<query>') already returns a normalised Unicode string. It's also a better idea to use wf.decode() than str.decode('utf-8') because wf.decode() will also normalise the Unicode. Fundamentally with docopt, you should call docopt first and wf.decode() on the results.
  • history contains encoded strings, not Unicode. Right there, anything from history is a potentially non-ASCII timebomb waiting to blow up. It should be: historyList = wf.decode(historyFile.read()).splitlines()
Edited by deanishe
Link to comment

You can't use γ as a dictionary key in your script because you haven't specified an encoding, therefore Python treats it as ASCII.

Put

 

# encoding: utf-8

 

at the top of the script.

Other than that, I don't see any obvious issues other than the ones mentioned above.

Link to comment

You can't use γ as a dictionary key in your script because you haven't specified an encoding, therefore Python treats it as ASCII.

Put

 

# encoding: utf-8

 

at the top of the script.

Other than that, I don't see any obvious issues other than the ones mentioned above.

 

Thanks for all the advice - it's always appreciated. Your suggestions did actually clear up a couple of my initial issues, but I then found that getting sympy's parse_expr function to recognize unicode characters in python 2 (esp. as custom var names) is not really optimal/easily possible, so for now I'm just handling it with a substitution on input to that function and then re-substitution after output (replacing 'γ' with 'gamma' and back again). Sorta gives me more peace of mind that way anyway, no funky stuff.

 

Thanks again for the help!

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...