yekong Posted August 17, 2014 Share Posted August 17, 2014 (edited) Dear all, I'm using latest alfred (v2.4(279)), and I have been trying to write my first alfred workflow with python. Inspired by the article http://www.deanishe.net/alfred-workflow/tutorial.html#creating-a-new-workflow , I try using my little python knowledge to write it out. And I've seen two alfred python utils, https://github.com/deanishe/alfred-workflow and https://github.com/nikipore/alfred-python. But NONE of them gonna work, they all run out with ERROR as follows: [ERROR: alfred.workflow.input.scriptfilter] XML Parse Error 'The operation couldn’t be completed. (NSXMLParserErrorDomain error 4.)'. Row (null), Col (null): 'Document is empty' in XML: The strange thing is that the code works for very rare time where as it just doesn't work! And I think it should NOT be the problem of the two python utils. Or I'm missing something here... And I've also seen the post, but I got no luck. http://www.alfredforum.com/topic/4238-xml-parse-error-in-alfred-22-fixed-23/?hl=%2Berror%3A+%2Balfred.workflow.input.scriptfilter Should anyone know the issue, please kindly post your reply, MUCH THANKS! As the code I write is very short by now, so I just paste it here. One with the first python util, the other with the second python util. And the workflow scripts are the same: python baidu_now.py "{query}". And I'm using python 2.7.5 Sorry about the code formatting, seems that there're no code blocking here... ==============first=================== # encoding: utf-8 from workflow import Workflow, ICON_WEB, web from BeautifulSoup import * import sys reload(sys) sys.setdefaultencoding('utf-8') def request_baidu_search(query): url = u'http://www.baidu.com/s?wd=query.replace(query' query) r = web.get(url) r.raise_for_status() return parse_baidu_results(r.content) def parse_baidu_results(content): soup = BeautifulSoup(content) tables = soup.findAll('div', {'class': 'result c-container '}) results = [] for table in tables: part1 = table.find(attrs={'class': 't'}) title = part1.a.renderContents() title = title.replace('<em>', '').replace('</em>', '') print title url = u'http:' + part1.a['href'] part2 = table.find('div', {'class': 'c-abstract'}) desc = part2.renderContents() results.append((title, url, desc)) return results def main(wf): query = wf.args[0] def wrapper(): return request_baidu_search(query) results = request_baidu_search(query)#wf.cached_data('results', wrapper, max_age=60) for result in results: wf.add_item( title = result[0], subtitle = result[1], arg = result[1], valid = True, icon = ICON_WEB) wf.send_feedback() if __name__ == '__main__': wf = Workflow() sys.exit(wf.run(main)) =========== second =============== # -*- coding: utf-8 -*- import alfred import requests from BeautifulSoup import BeautifulSoup import sys reload(sys) sys.setdefaultencoding('utf-8') def request_baidu_search(query): url = u'http://www.baidu.com/s?wd=query.replace(query' query) r = requests.get(url) return parse_baidu_results(r.content) def parse_baidu_results(content): soup = BeautifulSoup(content) tables = soup.findAll('div', {'class': 'result c-container '}) results = [] for table in tables: part1 = table.find(attrs={'class': 't'}) title = part1.a.renderContents() title = title.replace('<em>', '').replace('</em>', '') print title url = u'http:' + part1.a['href'] part2 = table.find('div', {'class': 'c-abstract'}) desc = part2.renderContents() results.append((title, url, desc)) return results def main(): query = alfred.args()[0] results = request_baidu_search(query) items = [alfred.Item( attributes = { 'uid': alfred.uid(0), 'arg': result[1] }, title = result[0], subtitle = result[2]) for result in results] xml_header = u'<?xml version="1.0" encoding="utf-8"?>'; xml = xml_header + alfred.xml(items) alfred.write(xml) if __name__ == u'__main__': sys.exit(main()) Edited August 17, 2014 by yekong Link to comment
deanishe Posted August 17, 2014 Share Posted August 17, 2014 (edited) There's a print statement in parse_baidu_results(). That will break the XML output. Either use wf.logger.debug(title) (using my library) or print(title, file=sys.stderr). These two lines in both your scripts will have no effect: reload(sys) sys.setdefaultencoding('utf-8') sys.setdefaultencoding() only works if it's in one of Python's system-wide start-up, sitecustomize.py. Changing sitecustomize.py is a very bad idea as it will change the Python environment for all your Python scripts and your code will likely not work on anybody else's machine. Edited August 17, 2014 by deanishe yekong 1 Link to comment
deanishe Posted August 17, 2014 Share Posted August 17, 2014 I've got the code up and running, I think. This is from your first script using my library. First of all, remove any print statements or make sure they write to STDERR (see previous post). STDOUT is where Alfred reads the XML from, so anything else printed to STDOUT will break the XML. Secondly, the title, url and desc returned by parse_baidu_results() are all strings, not Unicode. You have to decode these yourself using .decode('utf-8') or wf.decode(): title = wf.decode(title.replace('<em>', '').replace('</em>', '')) wf.logger.debug(title) url = u'http:' + wf.decode(part1.a['href']) part2 = table.find('div', {'class': 'c-abstract'}) desc = wf.decode(part2.renderContents()) Working with strings and Unicode is a PITA in Python 2, unfortunately Finally, the URL you're sending retrieving the results from isn't really correct. It won't work with multi-word queries, as query isn't being url-quoted. Here's a simple way to do it (web.py will correctly encode and quote it for you): def request_baidu_search(query): url = u'http://www.baidu.com/s' r = web.get(url, {'wd': query}) Here's my full working code: # encoding: utf-8 from workflow import Workflow, ICON_WEB, web from BeautifulSoup import * import sys def request_baidu_search(query): url = u'http://www.baidu.com/s' r = web.get(url, {'wd': query}) r.raise_for_status() return parse_baidu_results(r.content) def parse_baidu_results(content): soup = BeautifulSoup(content) tables = soup.findAll('div', {'class': 'result c-container '}) results = [] for table in tables: part1 = table.find(attrs={'class': 't'}) title = part1.a.renderContents() title = wf.decode(title.replace('<em>', '').replace('</em>', '')) wf.logger.debug(title) url = u'http:' + wf.decode(part1.a['href']) part2 = table.find('div', {'class': 'c-abstract'}) desc = wf.decode(part2.renderContents()) results.append((title, url, desc)) return results def main(wf): query = wf.args[0] def wrapper(): return request_baidu_search(query) #results = wf.cached_data('results', wrapper, max_age=60) results = request_baidu_search(query) for result in results: wf.add_item( title=result[0], subtitle=result[1], arg=result[1], valid=True, icon=ICON_WEB) wf.send_feedback() if __name__ == '__main__': wf = Workflow() sys.exit(wf.run(main)) yekong 1 Link to comment
yekong Posted August 17, 2014 Author Share Posted August 17, 2014 I've got the code up and running, I think. This is from your first script using my library. First of all, remove any print statements or make sure they write to STDERR (see previous post). STDOUT is where Alfred reads the XML from, so anything else printed to STDOUT will break the XML. Secondly, the title, url and desc returned by parse_baidu_results() are all strings, not Unicode. You have to decode these yourself using .decode('utf-8') or wf.decode(): title = wf.decode(title.replace('<em>', '').replace('</em>', '')) wf.logger.debug(title) url = u'http:' + wf.decode(part1.a['href']) part2 = table.find('div', {'class': 'c-abstract'}) desc = wf.decode(part2.renderContents()) Working with strings and Unicode is a PITA in Python 2, unfortunately Finally, the URL you're sending retrieving the results from isn't really correct. It won't work with multi-word queries, as query isn't being url-quoted. Here's a simple way to do it (web.py will correctly encode and quote it for you): def request_baidu_search(query): url = u'http://www.baidu.com/s' r = web.get(url, {'wd': query}) Here's my full working code: # encoding: utf-8 from workflow import Workflow, ICON_WEB, web from BeautifulSoup import * import sys def request_baidu_search(query): url = u'http://www.baidu.com/s' r = web.get(url, {'wd': query}) r.raise_for_status() return parse_baidu_results(r.content) def parse_baidu_results(content): soup = BeautifulSoup(content) tables = soup.findAll('div', {'class': 'result c-container '}) results = [] for table in tables: part1 = table.find(attrs={'class': 't'}) title = part1.a.renderContents() title = wf.decode(title.replace('<em>', '').replace('</em>', '')) wf.logger.debug(title) url = u'http:' + wf.decode(part1.a['href']) part2 = table.find('div', {'class': 'c-abstract'}) desc = wf.decode(part2.renderContents()) results.append((title, url, desc)) return results def main(wf): query = wf.args[0] def wrapper(): return request_baidu_search(query) #results = wf.cached_data('results', wrapper, max_age=60) results = request_baidu_search(query) for result in results: wf.add_item( title=result[0], subtitle=result[1], arg=result[1], valid=True, icon=ICON_WEB) wf.send_feedback() if __name__ == '__main__': wf = Workflow() sys.exit(wf.run(main)) Very patient and detail answer, THANK YOU SO MUCH !!! I've struggled for a whole day in vein... Link to comment
deanishe Posted August 17, 2014 Share Posted August 17, 2014 You're welcome. Your post actually helped me find a bug in web.py, so thanks! FWIW, what I normally do is make sure everything is Unicode by starting the file with: # encoding: utf-8 from __future__ import unicode_literals u = 'This is a Unicode string' s = b'This is an encoded string' (Instead of using u"" for Unicode strings, you use b"" for encoded strings, so normal "" strings are Unicode.) And then when I have a library that returns encoded strings, like BeautifulSoup, I'm careful to decode the output as soon as possible. If you use wf.logger.debug('%r', obj) to log objects returned by functions, you'll see whether they're strings/Unicode in the log (%r calls repr() on the object). Link to comment
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now