Jump to content

Sensible defaults or a clear explanation of Escaping options


deanishe

Recommended Posts

Thanks Dean!

 

Because we are wrapping in double quotes, can a space be put around the = sign for all of the language types without adding whitespace (to make things look more consistent).

 

Cheers,

Andrew

You can't put spaces around = in bash. Has to be:

query="{query}"
Link to comment
Share on other sites

Perfect, thanks!

 

I've just updated that list. I also removed escaping dollars for Python then tested each set of defaults with the query string... < `"\();$bob'> and they all output the exact string.

 

Does the final list look good?

 

Thanks for the original suggestion on this, it should really ease getting started with scripts.

 

Cheers,

Andrew

Link to comment
Share on other sites

  • 1 month later...

I think the intention of setting LC_CTYPE=UTF-8 was great, but unfortunately the flag value must also include a locale (e.g. en_US.UTF-8). Otherwise if the value can't be parsed, python will crash at locale.getdefaultlocale().

 

Andrew reverted that change in 2.7.1 but I would like to discuss whether it would be possible to set LC_CTYPE including the system locale. My temporary workaround to fix the workflow in 2.7 was export LC_CTYPE="$(defaults read -g AppleLocale).UTF-8" where the AppleLocale setting is a string value formatted like en_US. Technically the user could set different formats for each of the LC_* environment variables, though I'm not sure how common it would be since only a few changes are possible in Language & Region.

 

I have a bug report regarding unicode strings in XML output that could benefit from some follow-up testing. Since the current version of the workflow still includes the workaround above I will see whether the user is still having trouble after UTF-8 is specified. Will let you know if this fixes that problem.

Link to comment
Share on other sites

I'm sure that I could set the LC_CTYPE to include the locale, but there are a few unanswered questions at the moment...

 

1. The location should be set to the user's location, will this effect workflows? e.g. Mine would be en_GB.UTF-8 vs en_US.UTF-8.

2. Does a [local].UTF-8 exist for ALL locales, and if it doesn't, would python crash if set for that locale?

3. If it were set to e.g. en_US.UTF-8 for ALL users, would this break non US locales?

Link to comment
Share on other sites

Those are tough to answer definitively, here's a try:

  • Yes, but I would hope that in most cases this would positively affect workflows by improving the ability to localize, though there are also cases to the contrary. For example, if someone used %A in a strftime format to get the day of the week name, then operated on that name under the assumption that it would be English.
  • Not every language and country combination works, see below
  • We would have to figure out whether en_US was always the default locale (whether it came from this setting or elsewhere) in previous versions of Alfred.
I'm not sure how much is done with the locale by widely-used native python modules – datetime springs to mind as a fairly obvious way to test locales since it supports locale-specific date representations. The locale can affect sorting, regular expressions, and other operations but in most cases I believe that locale-sensitive operations are opt-in (e.g. must use the re.LOCALE flag or must load the locale explicitly by type).

 

Another question for Dean and anyone else more familiar with these things: since LC_CTYPE is only one of several variables that control localization, is it better to set the higher-level LANG variable? It seems that the system default is to set LANG, allowing overrides for each type if necessary. Python will take LC_ALL, otherwise LC_CTYPE, otherwise LANG as the default locale. Note that LC_ALL takes precedence over all the LC_* so we definitely don't want to set that. The other locales are ignored by default unless loaded explicitly by locale.setlocale(locale.LC_*, ''); where the blank string signals python to load the setting from the environment. For example,

> LANG=en_US.UTF-8 LC_TIME=es_ES.UTF-8 python -c "import locale;from datetime import datetime;print datetime.now().strftime('%A');locale.setlocale(locale.LC_TIME, '');print datetime.now().strftime('%A')"
Wednesday
miércoles
 

In this case, even setting LANG or LC_CTYPE to es_ES.UTF-8 still results in English for the first date format. It seems that workflow authors have to explicitly opt-in to non-English localization in Python. This is just the datetime module, other things could operate differently. I doubt the same opt-in policy holds true in all other languages.

 

Finally, using printenv in a fresh terminal I noticed that some settings in the Language & Region pane cause OS X to set strange environment variables. For example, select English as the language, Canada as the country, then in advanced options choose French as the number and date format. One might expect to see LANG=en_CA and LC_TIME=fr_CA (along with LC_NUMERIC, etc), but instead OS X sets LANG=fr_CA. The AppleLanguages global default still shows en as the preferred language for text but there is no environment variable corresponding to that.

 

Another example is English as the language, United States as the country, and Spanish as the number and date format language. While we would expect LANG=es_US following the behavior of the previous example, the only environment variable set is LC_CTYPE=UTF-8. Ouch, that's the one that bit Alfred in 2.7. The AppleLocale global default shows es_US. I think that speaks to your second question, some locale combinations are not supported and it is possible for real users to configure their systems in that way.

 

Unfortunately a bad combination like es_US will also break stuff in Python:

> LC_CTYPE=es_US.UTF-8 python -c "import locale;locale.setlocale(locale.LC_CTYPE,'')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 547, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting
Edited by ipaterson
Link to comment
Share on other sites

So, the aim of setting LC_CTYPE (which was my idea, and setting it to UTF-8 was also my (bad) idea) was to get the various languages that pull encoding information from the environment to default to UTF-8 (which is what Alfred uses) instead of ASCII, which is causing many of the encoding problems.

 

That is to say, the intention is to change IO encoding to UTF-8 and nothing else.
 
I'm afraid I didn't test it with locale.getdefaultlocale() because it's basically useless in workflows (None, None) unless you call locale.setlocale() yourself first. It does work with sys.stdout.encoding and Py3's decoding magic. (The Python locale and encoding libraries seem to have very different ideas about what values are valid.)
 
I'm terribly sorry about providing a wrong value, but I still feel strongly that this is something that should be done if at all possible, as it will eliminate a lot of the encoding issues that have come up with Python, Ruby and Perl workflows.
 
The root issue is that Alfred uses UTF-8 but tells any POSIX-compliant languages by omission that IO is ASCII. As a result, encoding is basically always on full-manual and thus something that workflow developers need to worry about. In many cases, depending on language, your script works fine with bytestrings (pretty much everything is UTF-8 anyway), but utilities like pbcopy and pbpaste won't work properly unless IO encoding is explicitly set. Anything that tries to do text processing correctly (i.e. needs to know the encoding) requires you to set IO encoding (or do en-/decoding) manually because the implicit default is wrong.
 
As has been noted, using AppleLocale isn't a viable solution because it often has a value like en_IT if an Italian wants his/her apps to be in English (some of Apple's translations are weird). A further effect of this is that even if your language (i.e. not Python) doesn't choke on the value, you don't know if you should follow the en part or the IT part…
 
What I've done in the past is use AppleLanguages and "massage" the results into a useable locale. That was perfect for the situations I had, but would be a rotten general solution.
 

I'm sure that I could set the LC_CTYPE to include the locale, but there are a few unanswered questions at the moment...
 
1. The location should be set to the user's location, will this effect workflows? e.g. Mine would be en_GB.UTF-8 vs en_US.UTF-8.
2. Does a [local].UTF-8 exist for ALL locales, and if it doesn't, would python crash if set for that locale?
3. If it were set to e.g. en_US.UTF-8 for ALL users, would this break non US locales?

 
1. Depends on the workflow. Most aren't locale-aware (after all, there is no locale in Alfred). Changing the locale will change the way some languages behave, as ipaterson said. Any workflow that cares about locales probably has to set its own (perhaps Apple's own languages are aware of the OSX-level settings and don't use POSIX locales from env vars?)
 
2. No, there isn't such a locale. A locale is primarily the language :(
 
3. This is the important question (I think)!
 
Obviously, en_US (or any locale) is going to be wrong for many users. On the other hand, the locale is currently effectively C (the POSIX default if no locale is specified). So, we could reframe the question as "Is the C.UTF-8 locale available and useable?" If so, setting LANG=C.UTF-8 would fix the encoding and change nothing else.
 
From my tests, C.UTF-8 doesn't appear to work on OS X :(
 
However, as best as I can determine, there don't appear to be very many differences between C and en_US. At least, not as regards LC_CTYPE. Setting LANG to en_US might result in Fahrenheit popping up somewhere.
 
Would it be possible to try a beta with LC_CTYPE=en_US.UTF-8? That is, at least, definitely a valid value, and the closest one to the C locale that is assumed by default.
 
As far as Python goes, setting PYTHONIOENCODING=UTF-8 does the trick. On Py2 that just means that sys.stdout etc. have their encoding set to UTF-8 instead of ASCII, so print(u'ünicöde') works. On Py3, it means all the decoding magic Py3 does for you will work properly with Alfred's IO.
 
Regarding locales, timezones etc., Python is a bit rubbish. Its locale support is marginally useful (it's global), and it doesn't understand timezones. The tzinfo attribute in datetime is a placeholder that other libraries, like dateutil and pytz, populate.

Edited by deanishe
Link to comment
Share on other sites

Well, PHP seems to follow a similar pattern of opt-in localization like Python. Unlike Python, in your example the locale does affect basic number formatting as documented (use %F to avoid localization):

> LC_NUMERIC=de_DE.UTF-8 php -r "printf('%f', 123.456);"
123.456
> LC_NUMERIC=de_DE.UTF-8 php -r "setlocale(LC_NUMERIC, '');printf('%f', 123.456);"
123,456

However that does not seem to be global as some other common functions expose parameters that you have to set in order to emulate locales. Folks recommend using localeconv() to figure out the correct values for the decimal and thousands separators. Seems PHP 5 has a class to handle number formatting in a less ridiculous way.

> LC_NUMERIC=de_DE.UTF-8 php -r "print number_format(123456);"
123,456
> LC_NUMERIC=de_DE.UTF-8 php -r "setlocale(LC_NUMERIC, '');print number_format(123456);"
123,456
> php -r "print number_format(123456,2,',','.');"
123.456,00
Link to comment
Share on other sites

Huge thanks for the discussion on this chaps, I'm sure we will get a good default solution!

 

One thing I may look into is offering a global editable environment for workflows so that users can setup environment variables which are applied when workflow scripts are run. This would also allow for experimenting with options so I can provide a decent set of defaults (even if the defaults are automatically generated for a user).

 

Cheers,

Andrew

Link to comment
Share on other sites

I like the idea of being able to set environmental variables. With a good UI, it's a great way to save workflow settings.

 

OTOH, as ipaterson said, some languages will print 1.2 as "1,2" in certain locales, and as Alfred workflows communicate via strings, that could break some workflows.

 

My motivation for wanting an encoding in the environment was to simplify workflow development, so while I think the ability to set env vars is pretty cool, I think I'd stay away from LC_NUMERIC and the like. There's also the risk of someone changing PATH and breaking things.

Edited by deanishe
Link to comment
Share on other sites

On a related note that might speak to user expectation, I noticed that the Alfred calculator respects my numeric separator settings. These settings must have been set when I changed to the fr_CA locale yesterday. Despite switching back to en_US the OS didn't revert my number formatting, so I was a bit surprised when I tried to use the calculator. Here is a simple example when my system was set to spaces for thousands separators (here we see "." instead of a space but I think that is common), and commas for decimals:

 

thousands.png?download=1
decimal.png?download=1

 

None of the LANG/LC_* variables would have reflected this preference (they're en_US). Rather, the separators setting is stored in the AppleICUNumberSymbols global default. Ultimately it would be pretty tough for a workflow to always do the right thing. A guide on localization and helper methods from the various workflow libraries would be awesome for workflows that want/need to embrace international users without manually tiptoeing around all of these gotchas. I agree with Dean that Alfred's place is to ensure that the required encodings work with as minimal change to previous localization behavior as possible.

Edited by ipaterson
Link to comment
Share on other sites

Did you restart Alfred after changing the locale back to en_US?
 
Personally, I think it's best to pretend that localisation isn't really a think as far as workflows go, and Alfred shouldn't set LANG or LC_ALL or LC_NUMERIC. Having %f be 1,2 on some systems and 1.2 on others seems like it could be quite a big gotcha.
 
Localisation would be a nice feature, but as long as Alfred is English-only, I don't think internationalising workflows should be a priority: the users are by definition okay with English-language apps.
 
Like you say, it's tough to know what the right thing to do is, anyway. If the locale is en_IT, do you use the English or Italian versions of IMDB, Amazon or Google?
 

A guide on localization and helper methods from the various workflow libraries would be awesome for workflows that want/need to embrace international users without manually tiptoeing around all of these gotchas.

 

There are no helper methods for localisation in workflow libraries, are there?

 

What would make sense? I suppose getting a useable locale might be a useful feature for Alfred-Workflow. I'm not sure it would be a much-used one, though …

Link to comment
Share on other sites

Yeah I do think that features like that would be woefully underused. Worth polling your userbase on github in the v2 thread? We should probably start a new thread here if you want to discuss specifics with regard to workflows.

For the calculator formatting, I meant that OSX still had spaces and commas in my system settings despite switching back to en_US, nothing was stuck in Alfred. It reset to US format when I made the decimals change manually in OSX prefs.

Link to comment
Share on other sites

I don't think there'd be any interest at all in localisation. I mean, the majority of the regular AW posters/contributors aren't native English-speakers but none has expressed any interest in that direction, and there haven't been any related pull requests, either.
 
I just don't envisage localisation becoming a thing with workflows as long as Alfred itself is English-only. I only know of a couple of workflows that care at all about locales, and none for purposes of localisation.
 
What could AW realistically do to help? Give you a locale and perhaps a locale-aware number formatting class? Most of the work is in fixing your strings for gettext and handling plurals, and then doing the actual translation work.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...