The output escaping isn't the problem—he's letting the XML library do that, as would any Alfred library.
The big problem is the text encoding is all over the place. Veritas, you're mixing unicode strings (string_to_regex) and UTF-8-encoded strings (everything else). You must encode the output to UTF-8 before printing it to Alfred.
Try this instead:
# -*- coding: utf-8 -*-
import sys
from xml.etree.ElementTree import Element, SubElement, Comment, tostring
string_to_regex = sys.argv[1].decode(u'utf-8')
regexified = []
# Builds each part of the XML tree
def build_xmltree(items):
item = SubElement(items, u'item')
title = SubElement(item, u'title')
subtitle = SubElement(item, u'subtitle')
icon = SubElement(item, u'icon')
return (item, title, subtitle, icon)
for i in string_to_regex:
if i in u"\\[]().*+-^$|":
regexified.append(u"\\" + i)
else:
regexified.append(i)
alfred = u''.join(regexified)
items = Element(u'items')
item, title, subtitle, icon = build_xmltree(items)
item.set(u'uid', u'regexified')
item.set(u'arg', alfred)
item.set(u'valid', u'yes')
title.text = alfred
subtitle.text = u'Copy to Clipboard'
print tostring(items, encoding=u'utf-8')
And set up the script escaping so:
It should do what you want (if I've understood it correctly).
A lot of the strings I've changed to unicode literals are unnecessary (tag names, 'utf-8' etc.), but it's a good habit to get into with Python 2, otherwise at some point you'll combine unicode and non-unicode and bad things will happen.
Be sure to decode all external input (sys.argv, os.environ, sys.stdin etc.) to unicode immediately, and encode all output (usually to UTF-8) before you print/write it.
If you don't do that, any non-ASCII character will likely break your script.
If you're working with filepaths, you also have to call unicodedata.normalize(u'NFD', u'your unicode path here') or weird things will happen (on OS X).