Jump to content

[HELP] Alfred workflow dev with python always with ERROR 'Document is empty' in XML


Recommended Posts

Dear all,

 

I'm using latest alfred (v2.4(279)), and I have been trying to write my first alfred workflow with python.

 

Inspired by the article http://www.deanishe.net/alfred-workflow/tutorial.html#creating-a-new-workflow , I try using my little python knowledge to write it out. And I've seen two alfred python utils, https://github.com/deanishe/alfred-workflow and  https://github.com/nikipore/alfred-python. But NONE of them gonna work, they all run out with ERROR as follows:

 

[ERROR: alfred.workflow.input.scriptfilter] XML Parse Error 'The operation couldn’t be completed. (NSXMLParserErrorDomain error 4.)'. Row (null), Col (null): 'Document is empty' in XML:

 

The strange thing is that the code works for very rare time where as it just doesn't work! And I think it should NOT be the problem of the two python utils. Or I'm missing something here...

 

And I've also seen the post, but I got no luck.

http://www.alfredforum.com/topic/4238-xml-parse-error-in-alfred-22-fixed-23/?hl=%2Berror%3A+%2Balfred.workflow.input.scriptfilter

 

Should anyone know the issue, please kindly post your reply, MUCH THANKS!

 

As the code I write is very short by now, so I just paste it here. One with the first python util, the other with the second python util. And the workflow scripts are the same: python baidu_now.py "{query}". And I'm using python 2.7.5

 

Sorry about the code formatting, seems that there're no code blocking here...

 

==============first===================

 

# encoding: utf-8
 
from workflow import Workflow, ICON_WEB, web
from BeautifulSoup import *
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
 
def request_baidu_search(query):
r = web.get(url)
 
r.raise_for_status()
 
return parse_baidu_results(r.content)
 
def parse_baidu_results(content):
soup = BeautifulSoup(content)
tables = soup.findAll('div', {'class': 'result c-container '})
results = []
for table in tables:
part1 = table.find(attrs={'class': 't'})
title = part1.a.renderContents()
title = title.replace('<em>', '').replace('</em>', '')
print title
url = u'http:' + part1.a['href']
part2 = table.find('div', {'class': 'c-abstract'})
desc = part2.renderContents()
results.append((title, url, desc))
return results
 
def main(wf):
query = wf.args[0]
 
def wrapper():
return request_baidu_search(query)
 
results = request_baidu_search(query)#wf.cached_data('results', wrapper, max_age=60)
 
for result in results:
wf.add_item(
title = result[0],
subtitle = result[1],
arg = result[1],
valid = True,
icon = ICON_WEB)
wf.send_feedback()
 
if __name__ == '__main__':
wf = Workflow()
sys.exit(wf.run(main))
 
 
=========== second ===============
 
# -*- coding: utf-8 -*-
 
import alfred
import requests
from BeautifulSoup import BeautifulSoup
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
 
def request_baidu_search(query):
    r = requests.get(url)
    return parse_baidu_results(r.content)
 
def parse_baidu_results(content):
    soup = BeautifulSoup(content)
    tables = soup.findAll('div', {'class': 'result c-container '})
    results = []
    for table in tables:
        part1 = table.find(attrs={'class': 't'})
        title = part1.a.renderContents()
        title = title.replace('<em>', '').replace('</em>', '')
        print title
        url = u'http:' + part1.a['href']
        part2 = table.find('div', {'class': 'c-abstract'})
        desc = part2.renderContents()
        results.append((title, url, desc))
    return results
 
def main():
    query = alfred.args()[0]
    results = request_baidu_search(query)
    items = [alfred.Item(
                attributes = { 'uid': alfred.uid(0), 'arg': result[1] },
                title = result[0],
                subtitle = result[2])
             for result in results]
    xml_header = u'<?xml version="1.0" encoding="utf-8"?>';
    xml = xml_header + alfred.xml(items)
    alfred.write(xml)
 
if __name__ == u'__main__':
    sys.exit(main())
 
Edited by yekong
Link to comment

There's a print statement in parse_baidu_results(). That will break the XML output. Either use wf.logger.debug(title) (using my library) or print(title, file=sys.stderr).

 

These two lines in both your scripts will have no effect:

reload(sys)
sys.setdefaultencoding('utf-8')

sys.setdefaultencoding() only works if it's in one of Python's system-wide start-up, sitecustomize.py. Changing sitecustomize.py is a very bad idea as it will change the Python environment for all your Python scripts and your code will likely not work on anybody else's machine.

Edited by deanishe
Link to comment

I've got the code up and running, I think. This is from your first script using my library.
 
First of all, remove any print statements or make sure they write to STDERR (see previous post). STDOUT is where Alfred reads the XML from, so anything else printed to STDOUT will break the XML.
 
Secondly, the title, url and desc returned by parse_baidu_results() are all strings, not Unicode. You have to decode these yourself using .decode('utf-8') or wf.decode()

    title = wf.decode(title.replace('<em>', '').replace('</em>', ''))
    wf.logger.debug(title)
    url = u'http:' + wf.decode(part1.a['href'])
    part2 = table.find('div', {'class': 'c-abstract'})
    desc = wf.decode(part2.renderContents())

Working with strings and Unicode is a PITA in Python 2, unfortunately :(
 
 
Finally, the URL you're sending retrieving the results from isn't really correct. It won't work with multi-word queries, as query isn't being url-quoted. Here's a simple way to do it (web.py will correctly encode and quote it for you):

def request_baidu_search(query):
    url = u'http://www.baidu.com/s'
    r = web.get(url, {'wd': query})

Here's my full working code:

# encoding: utf-8

from workflow import Workflow, ICON_WEB, web
from BeautifulSoup import *
import sys


def request_baidu_search(query):
    url = u'http://www.baidu.com/s'
    r = web.get(url, {'wd': query})

    r.raise_for_status()

    return parse_baidu_results(r.content)


def parse_baidu_results(content):
    soup = BeautifulSoup(content)
    tables = soup.findAll('div', {'class': 'result c-container '})
    results = []
    for table in tables:
        part1 = table.find(attrs={'class': 't'})
        title = part1.a.renderContents()
        title = wf.decode(title.replace('<em>', '').replace('</em>', ''))
        wf.logger.debug(title)
        url = u'http:' + wf.decode(part1.a['href'])
        part2 = table.find('div', {'class': 'c-abstract'})
        desc = wf.decode(part2.renderContents())
        results.append((title, url, desc))
    return results


def main(wf):
    query = wf.args[0]

    def wrapper():
        return request_baidu_search(query)

    #results = wf.cached_data('results', wrapper, max_age=60)
    results = request_baidu_search(query)

    for result in results:
        wf.add_item(
            title=result[0],
            subtitle=result[1],
            arg=result[1],
            valid=True,
            icon=ICON_WEB)

    wf.send_feedback()

if __name__ == '__main__':
    wf = Workflow()
    sys.exit(wf.run(main))

 

Link to comment

 

I've got the code up and running, I think. This is from your first script using my library.

 

First of all, remove any print statements or make sure they write to STDERR (see previous post). STDOUT is where Alfred reads the XML from, so anything else printed to STDOUT will break the XML.

 

Secondly, the title, url and desc returned by parse_baidu_results() are all strings, not Unicode. You have to decode these yourself using .decode('utf-8') or wf.decode()

    title = wf.decode(title.replace('<em>', '').replace('</em>', ''))
    wf.logger.debug(title)
    url = u'http:' + wf.decode(part1.a['href'])
    part2 = table.find('div', {'class': 'c-abstract'})
    desc = wf.decode(part2.renderContents())

Working with strings and Unicode is a PITA in Python 2, unfortunately :(

 

 

Finally, the URL you're sending retrieving the results from isn't really correct. It won't work with multi-word queries, as query isn't being url-quoted. Here's a simple way to do it (web.py will correctly encode and quote it for you):

def request_baidu_search(query):
    url = u'http://www.baidu.com/s'
    r = web.get(url, {'wd': query})

Here's my full working code:

# encoding: utf-8

from workflow import Workflow, ICON_WEB, web
from BeautifulSoup import *
import sys


def request_baidu_search(query):
    url = u'http://www.baidu.com/s'
    r = web.get(url, {'wd': query})

    r.raise_for_status()

    return parse_baidu_results(r.content)


def parse_baidu_results(content):
    soup = BeautifulSoup(content)
    tables = soup.findAll('div', {'class': 'result c-container '})
    results = []
    for table in tables:
        part1 = table.find(attrs={'class': 't'})
        title = part1.a.renderContents()
        title = wf.decode(title.replace('<em>', '').replace('</em>', ''))
        wf.logger.debug(title)
        url = u'http:' + wf.decode(part1.a['href'])
        part2 = table.find('div', {'class': 'c-abstract'})
        desc = wf.decode(part2.renderContents())
        results.append((title, url, desc))
    return results


def main(wf):
    query = wf.args[0]

    def wrapper():
        return request_baidu_search(query)

    #results = wf.cached_data('results', wrapper, max_age=60)
    results = request_baidu_search(query)

    for result in results:
        wf.add_item(
            title=result[0],
            subtitle=result[1],
            arg=result[1],
            valid=True,
            icon=ICON_WEB)

    wf.send_feedback()

if __name__ == '__main__':
    wf = Workflow()
    sys.exit(wf.run(main))

 

Very patient and detail answer, THANK YOU SO MUCH !!! I've struggled for a whole day in vein... 

Link to comment

You're welcome. Your post actually helped me find a bug in web.py, so thanks!
 
FWIW, what I normally do is make sure everything is Unicode by starting the file with: 

# encoding: utf-8
 
from __future__ import unicode_literals
 
u = 'This is a Unicode string'
s = b'This is an encoded string'

 
(Instead of using u"" for Unicode strings, you use b"" for encoded strings, so normal "" strings are Unicode.)
 
And then when I have a library that returns encoded strings, like BeautifulSoup, I'm careful to decode the output as soon as possible.
 
If you use wf.logger.debug('%r', obj) to log objects returned by functions, you'll see whether they're strings/Unicode in the log (%r calls repr() on the object).

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...