mdreizin Posted April 16, 2013 Share Posted April 16, 2013 (edited) It seems app has an issue is related to incorrect encoding of {query} / script block. For instance I'm typing "й" symbol and app sends it as 4 bytes, but actually it is 2 bytes in utf-8. Could you please explain me: How does app encode {query} / script block? What kind of encodings does it use for that? If you will try to run "init.rb" via terminal you will get: ruby -Ku "init.rb" "й" <?xml version="1.0"?><items><item uid="stats-0" valid="yes"><title>System</title><subtitle>hex: ["d0b9"], size: 2</subtitle><icon>icon.png</icon></item><item uid="stats-1" valid="yes"><title>Query</title><subtitle>hex: ["d0b9"], size: 2</subtitle><icon>icon.png</icon></item></items> BUT if you will try to run code via workflow you will get: <?xml version="1.0"?><items><item uid="stats-0" valid="yes"><title>System</title><subtitle>hex: ["d0b9"], size: 2</subtitle><icon>icon.png</icon></item><item uid="stats-1" valid="yes"><title>Query</title><subtitle>hex: ["d0b8cc86"], size: 4</subtitle><icon>icon.png</icon></item></items> You can find the workflow here. OS X 10.8.3 Alfred 2.0.2 (178) Ruby 1.8.7 / 1.9.x / 2.0.x Edited April 16, 2013 by mdreizin Link to comment
Andrew Posted April 16, 2013 Share Posted April 16, 2013 It seems app has an issue is related to incorrect encoding of {query} / script block. For instance I'm typing "й" symbol and app sends it as 4 bytes, but actually it is 2 bytes in utf-8. Could you please explain me: How does app encode {query} / script block? What kind of encodings does it use for that? If you will try to run "init.rb" via terminal you will get: ruby -Ku "init.rb" "й" <?xml version="1.0"?><items><item uid="stats-0" valid="yes"><title>System</title><subtitle>hex: ["d0b9"], size: 2</subtitle><icon>icon.png</icon></item><item uid="stats-1" valid="yes"><title>Query</title><subtitle>hex: ["d0b9"], size: 2</subtitle><icon>icon.png</icon></item></items> BUT if you will try to run code via workflow you will get: <?xml version="1.0"?><items><item uid="stats-0" valid="yes"><title>System</title><subtitle>hex: ["d0b9"], size: 2</subtitle><icon>icon.png</icon></item><item uid="stats-1" valid="yes"><title>Query</title><subtitle>hex: ["d0b8cc86"], size: 4</subtitle><icon>icon.png</icon></item></items> You can find the workflow here. OS X 10.8.3 Alfred 2.0.2 (178) Ruby 1.8.7 / 1.9.x / 2.0.x Alfred uses NSTask to bridge across to the scripting language. Cocoa automatically normalises any passed in arguments with decomposition which splits the characters down as you see. You'll need to re-normalise into the format you need. http://en.wikipedia.org/wiki/Unicode_equivalence#Normalization I've done plenty of research over this, one beta build even had a way to set the normalisation type in the workflow, but NSTask always re-normalised so it was removed. Cheers, Andrew Link to comment
mdreizin Posted April 16, 2013 Author Share Posted April 16, 2013 (edited) Alfred uses NSTask to bridge across to the scripting language. Cocoa automatically normalises any passed in arguments with decomposition which splits the characters down as you see. You'll need to re-normalise into the format you need. http://en.wikipedia.org/wiki/Unicode_equivalence#Normalization I've done plenty of research over this, one beta build even had a way to set the normalisation type in the workflow, but NSTask always re-normalised so it was removed. Cheers, Andrew Andrew, thanks a lot for the reply. I will try to de-normalize strings in my code. Edited April 16, 2013 by mdreizin Link to comment
Andrew Posted April 17, 2013 Share Posted April 17, 2013 Andrew, thanks a lot for the reply. I will try to de-normalize strings in my code. I've created a small command line tool which should hopefully help you re-normalise any strings: https://dl.dropboxusercontent.com/u/6749767/Alfred/normalise.zip If you include this in your workflow itself, you should be able to run it directly like this: usage: ./normalise -form NFC й You can add -verbose after NFC to see what is happening, or no arguments to see the options. Let me know if that helps at all hubertcampan, mdreizin and chadv 3 Link to comment
hubertcampan Posted April 21, 2013 Share Posted April 21, 2013 Hi Andrew, It did help me on this topic http://www.alfredforum.com/topic/1907-bash-script-in-workflow-language-of-locals-accented-characters/?p=11799 pointed by _mk_ Thanks. Hubert Link to comment
Andrew Posted April 21, 2013 Share Posted April 21, 2013 Hi Andrew, It did help me on this topic http://www.alfredforum.com/topic/1907-bash-script-in-workflow-language-of-locals-accented-characters/?p=11799 pointed by _mk_ Thanks. Hubert Excellent - I tried to find your topic to post it there too, glad you found it! Link to comment
mdreizin Posted May 5, 2013 Author Share Posted May 5, 2013 I've created a small command line tool which should hopefully help you re-normalise any strings: https://dl.dropboxusercontent.com/u/6749767/Alfred/normalise.zip If you include this in your workflow itself, you should be able to run it directly like this: usage: ./normalise -form NFC й You can add -verbose after NFC to see what is happening, or no arguments to see the options. Let me know if that helps at all Hi Andrew, It helps me to resolve my issue. Thanks a lot Link to comment
m0nah Posted September 12, 2013 Share Posted September 12, 2013 Andrew, It helps me to resolve my issue too. Thanks! Link to comment
mklement0 Posted May 12, 2014 Share Posted May 12, 2014 (edited) @Andrew's normalise utility works great, but I've since found that there is an alternative using the standard utility iconv with the (somewhat obscurely named) UTF8-MAC encoding scheme: Note: The following examples use bash. iconv expects its input via a filename or stdin. Applied to the example above: # Converts NFD form of 'й' to NFC form iconv -f UTF8-MAC <<<'й' Some background: The following examples use input string 'ü' in NFC form, $'\xc3\xbc' - i.e., bytes 0xC3 0xBC, which is the UTF8 encoding of Unicode codepoint 0xFC in NFD form, $'u\xcc\x88' - i.e., a u - the base character - followed by bytes 0xCC 0x88, which is the UTF8 encoding of Unicode codepoint 0x308, the so-called combining diaeresis (¨). to demonstrate converting; note that in Terminal the result will always appear as ü - pipe to hexdump -C, for instance, to see the byte values. # NFC -> NFDiconv -t UTF8-MAC <<<$'\xc3\xbc' # -> $'u\xcc\x88' # NFD -> NFCiconv -f UTF8-MAC <<<$'u\xcc\x88' # -> $'\xc3\xbc' These conversions are safe to use in that if the input string is already in the target format, it is left as is. Edited May 12, 2014 by mklement0 chadv 1 Link to comment
chadv Posted June 19, 2014 Share Posted June 19, 2014 (edited) @Andrew's normalise utility works great, but I've since found that there is an alternative using the standard utility iconv with the (somewhat obscurely named) UTF8-MAC encoding scheme: @mklement0 Thanks for this tip. iconv works great for most strings, but I found that it does not work for some emoji. Pile of poo for instance. It gives the following error: [ERROR: alfred.workflow.input.scriptfilter] Code 0: iconv: (stdin):1:4: cannot convert It's a shame, because it's such an elegant solution, otherwise. For posterity, I'll add that this is happening in OS X 10.9.3 (libiconv 1.11), hopefully a future version fixes this. @Andrew's script seems to be handling all emoji correctly. I'm going to use that for now. Edited June 19, 2014 by chadv Link to comment
mklement0 Posted August 11, 2014 Share Posted August 11, 2014 (edited) but I found that it does not work for some emoji. @chadv: Thanks for investigating and letting me know. Shame indeed, especially given that it hasn't been fixed in OS X 10.10 (the current public beta), which still ships with the same libiconv version (1.11). Curiously, 3- and 6-byte UTF8 emoji sequences as well as those 7-byte sequences that start with an ASCII char. byte (followed by combining characters) do work properly, but the majority of emoji (4-byte sequences) do not. On a side note, Terminal.app, while *rendering* emoji as expected, doesn't handle them properly in terms of cursor placement, printing the next character, and backspacing. 6-, 7-, 8-byte sequences seemingly involve combining characters, and are misinterpreted as comprising *2 or 3* characters, which has all sorts of unwanted side effects. Edited August 23, 2014 by mklement0 chadv 1 Link to comment
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now