Jump to content
jmjeong

XML parse error in Alfred 2.2 [Fixed 2.3]

Recommended Posts

Hi, I am using Alfred v2.2(243)

 

I am author of alfred-pinboard and other workflows. You can look up my workflow from github.com/jmjeong/alfred-extension

 

I have some question about alfred workflow extension. I tried to find document about this, but I failed. 

 

1. Is there any limit of items of workflow: the maximum number of items,  and maximum size of string?

2. Is there any time lime for script to be executed?

 

The result of of alfred-pinboard output is as follows when it is executed in the shell. 

 


 

When it is executed in the alfred shell, the error message is displayed.  But it is random, sometime it is ok, but sometimes it displays error. 

 

Starting debug for 'pinboard'

 

[ERROR: alfred.workflow.input.scriptfilter] XML Parse Error 'The operation couldn’t be completed. (NSXMLParserErrorDomain error 4.)'. Row 1, Col 1: 'Document is empty' in XML:

></item><item arg="https://fnd.io/#/" uid="com.jmjeong.alfredv2.pinboard-33"><title>Experience the App Store and iTunes Anywhere | fnd</title><subtitle /><icon>item.png</icon></item><item arg="http://ludens.kr/741" uid="com.jmjeong.alfredv2.pinboard-34"><title>저작권 문제없는 고퀄리티 사진을 구할 수 있는 사이트 10곳 #Ludens</title><subtitle /><icon>item.png</icon></item><item arg="http://www.istockphoto.com/" 

 

Screenshot%20of%20Alfred%20Preferences.j

 

 

To test if the result xml is wrong, I make simple workflow to display the first result. 

 


 

The result is ok. 

 

ptest.png

 

 

The first result xml seems to be ok in this test.  

 

I don't guess why alfred-pinboard displays error. Most of the time, it is ok, but sometimes it displays error. 

 

If you need any resource to reproduce this error, I could provide it. 

Edited by jmjeong

Share this post


Link to post
Share on other sites

 

Hi, I am using Alfred v2.2(243)
 
I am author of alfred-pinboard and other workflows. You can look up my workflow from github.com/jmjeon/alfred-extension
 
I have some question about alfred workflow extension. I tried to find document about this, but I failed. 
 
1. Is there any limit of items of workflow: the maximum number of items,  and maximum size of string?
2. Is there any time lime for script to be executed?
 
When it is executed in the alfred shell, the error message is displayed.  But it is random, sometime it is ok, but sometimes it displays error. 
 

 

I'm going to move this to the workflow help section to see if somebody can help you. Alfred doesn't impose size limits on the scripts / xml, and from that debug log, it looks like the XML being passed back to Alfred is missing some data, so possibly an error in the script.
 
I'll keep an eye on this, if anybody can shed some light on the issue.
 
Cheers,
Andrew

Share this post


Link to post
Share on other sites

It looks like an encoding issue.

Try adding the encoding argument to your XML header:

 

<?xml version="1.0" encoding="utf-8"?>
It goes without saying that you should also ensure that the XML is encoded with the corresponding, err, encoding. Edited by deanishe

Share this post


Link to post
Share on other sites

It looks like an encoding issue.

Try adding the encoding argument to your XML header:

 

<?xml version="1.0" encoding="utf-8"?>
It goes without saying that you should also ensure that the XML is encoded with the corresponding, err, encoding.

 

 

Thank you for your comments. Adding the encoding argument solves the problem. 

 

I will check it with another data set.

Share this post


Link to post
Share on other sites

Today, it happened again.  :-(

I added encoding="utf-8" in xml header. 

 

 

13990073744_aeacf0ff03.jpg

 

 

Starting debug for 'pinboard'

 

[ERROR: alfred.workflow.input.scriptfilter] XML Parse Error 'The operation couldn’t be completed. (NSXMLParserErrorDomain error 5.)'. Row 1, Col 189: 'Extra content at the end of the document' in XML:

<item arg="http://www.cocos2d-iphone.org/getting-started/" uid="com.jmjeong.alfredv2.pinboard-27"><title>Getting Started | Obj-C based 2D engine for iOS, OSX and Android</title><subtitle>http://www.cocos2d-iphone.org/getting-started/</subtitle><icon>item.png</icon></item><item arg="http://blog.naver.com/PostView.nhn?blogId=namkoong&logNo=130076721452&parentCategoryNo=&categoryNo=&viewDate=&isShowPopularPosts=false&from=postView" uid="com.jmjeong.alfredv2.pinboard-28"><title> 싱가폴에서의 브롬톤 : 네이버 블로그</title><subtitle>http://blog.naver.com/PostView.nhn?blogId=namkoong&logNo=130076721452&parentCategoryNo=&categoryNo=&viewDate=&isShowPopularPosts=false&from=postView</subtitle><icon>item.png</icon></item><item arg="http://blog.naver.com/PostView.nhn?blogId=neo_flavor&logNo=60195970808" uid="com.jmjeong.alfredv2.pinboard-29"><title> 브롬톤 S2L RG 2013년식, 나의 3번째 브롬톤. : 네이버 블로그</title><subtitle>http://blog.naver.com/PostView.nhn?blogId=neo_flavor&logNo=60195970808</subtitle><icon>item.png</icon></item><item arg="http://jpsarda.tumblr.com/post/24983791554/mix

....

....

 

 

There is no error when the exact same output is displayed with xmlformat script

 

cat << EOB

<?xml version="1.0" encoding="utf-8"?><items><item arg="http://shopping.naver.com/search/all_search.nhn?query=%EC%BD%94%EC%BF%A4%20%EA%B7%B8%EB%A6%AC%EB%93%9C%EC%9E%87&frm=NVSCPRO" uid="com.jmjeong.alfredv2.pinboard-0"><title>코쿤 그리드잇 검색결과 : 네이버 지식쇼핑</title><subtitle>http://shopping.naver.com/search/all_search.nhn?query=%EC%BD%94%EC%BF%A4%20%EA%B7%B8%EB%A6%AC%EB%93%9C%EC%9E%87&frm=NVSCPRO</subtitle><icon>item.png</icon></item><item arg="http://radiofun.tumblr.com/post/83412070151/flipboard-layout-duplo" uid="com.jmjeong.alfredv2.pinboard-1"><title>nothing special • Flipboard의 Layout 엔진, Duplo</title><subtitle>http://radiofun.tumblr.com/post/83412070151/flipboard-layout-duplo</subtitle><icon>item.png</icon>

...

EOB

 

 

 

Any comments are welcome. 

Share this post


Link to post
Share on other sites

How are you generating the XML? Are you sure it's actually UTF-8 encoded?

 

It still looks like an encoding problem (i.e. it's correctly encoded for xmlformat, but not when generated by your script).

Share this post


Link to post
Share on other sites

How are you generating the XML? Are you sure it's actually UTF-8 encoded?

 

It still looks like an encoding problem (i.e. it's correctly encoded for xmlformat, but not when generated by your script).

 

Thank you for the comments. I will test it more and post the result. 

Share this post


Link to post
Share on other sites
I made three simple workflows to test this problem. 

 

To  isolate xml error from other unidentified encoding issue, I make some simple script to display xml output.  Test workflow link is http://cl.ly/3D3y1b0N2S0X

 

'test1' script displays error. 

 

[ERROR: alfred.workflow.input.scriptfilter] XML Parse Error 'The operation couldn’t be completed. (NSXMLParserErrorDomain error 5.)'. Row 1, Col 189: 'Extra content at the end of the document' in XML:

 

'test2' script is good.  'test2' add only one white space in the first line. 

 

'test3' script is good. In 'test3',  I added space between "</title>" and "<subtitle>"

 

</title> <subtitle>

 

This symptom is somewhat random. Actually most of times it is ok, but sometimes it displays error. 

Any comments are highly appreciated.

Edited by jmjeong

Share this post


Link to post
Share on other sites

That is starting to look like an error on Alfred's side of things (the XML validates elsewhere). I've had problems myself in the past with Alfred rejecting valid XML.

Could you possibly post the code that generates the XML?

At any rate, adding a space to the beginning of the output seems to fix the problem, so do that.

Share this post


Link to post
Share on other sites

That is starting to look like an error on Alfred's side of things (the XML validates elsewhere). I've had problems myself in the past with Alfred rejecting valid XML.

Could you possibly post the code that generates the XML?

At any rate, adding a space to the beginning of the output seems to fix the problem, so do that.

 

alfred-pinboard workflow is in https://github.com/jmjeong/alfred-extension/tree/master/pinboard

 

I think adding a space to the beginning of the output is not the solution, because sometimes that output produces error and the output without space does well. I feel awkward that the result of xml parsing is random

Share this post


Link to post
Share on other sites

Thanks for posting the code. I've pinned down the problem (I think). You were on the right track with the Unicode normalisation you commented out.

If you use NFD normalisation (the OS X default) instead of NFC (the Python default), everything works just fine. Note, you're using an older version of alfred.py. It has since been updated to use NFD normalisation (this ensures that command-line arguments match filepaths retrieved from the OS X filesystem).

content = unicodedata.normalize('NFD', content)
Edited by deanishe

Share this post


Link to post
Share on other sites

Thanks for posting the code. I've pinned down the problem (I think). You were on the right track with the Unicode normalisation you commented out.

If you use NFD normalisation (the OS X default) instead of NFC (the Python default), everything works just fine. Note, you're using an older version of alfred.py. It has since been updated to use NFD normalisation (this ensures that command-line arguments match filepaths retrieved from the OS X filesystem).

 

content = unicodedata.normalize('NFD', content)

 

Thanks for your valuable comment again.

 

Where can I get the newer version of alfred.py?

I will convert the output of alfred-pinboard to 'NFD' normalization and try to use it some more time. 

 

I think it takes some time to determine if it is solved or not, because every single change of output affects xml parsing result of alfredapp. 

Share this post


Link to post
Share on other sites

I changed output to 'NFD' normalisation, and tested it more. The frequency of the error is low, but the error happens again anyway.

 

I can't help concluding that NSXMLParser in alfred app is not good for parsing the large xml data(120K). 

The error code is somewhat random.  Actually there is no '<' in attributes values in XML. 

 

[ERROR: alfred.workflow.input.scriptfilter] XML Parse Error 'The operation couldn’t be completed. (NSXMLParserErrorDomain error 38.)'. Row 1, Col 5354: 'Unescaped '<' not allowed in attributes values' in XML:

 

I hope the next version of alfredapp supports json format also for better compatibility. 

 

Another issue: 

 

In alfred-pinboard, I pass '&' for args field to open URL in browser. But I must escape '&' to '&' in xml, and the page does not open correctly in browser. 

I could find another workaround for it. But I think json format is more flexible. 

Share this post


Link to post
Share on other sites

You shouldn't be escaping anything yourself: the XML library used by alfred.py will do that automatically.

If you're using a library like alfred.py, alp or Alfred-Workflow, using XML should be every bit as simple as JSON, and there are distinct advantages to XML, at least with Python.

Share this post


Link to post
Share on other sites

NSXMLParser is fine with large data files, and 120 kB is very small. It's used by millions of people, so the chances are that the error (and solution) is in your code.

I can't say anymore because you're only posting the errors and not the data that is causing them.

Ampersands in URLs work just fine for me.

Also, the code/workflow on GitHub doesn't appear to be the same code you're using: alfred.py is still the old version.

Share this post


Link to post
Share on other sites
NSXMLParser is fine with large data files, and 120 kB is very small. It's used by millions of people, so the chances are that the error (and solution) is in your code.

 

 
I agree that NSXMLParser is used by millions of people. But I am convinced that the one in alfredapp  produces errors with some valid xml.  I made another test workflow to test validity of xml output http://cl.ly/0m1r1G14003e. It is bash script to produces xml output, which is validated by http://validator.w3.org/check?uri=http%3A%2F%2Fs.jmjeong.com%2Faa.xml&charset=%28detect+automatically%29&doctype=Inline&ss=1&group=0&user-agent=W3C_Validator%2F1.3+http%3A%2F%2Fvalidator.w3.org%2Fservices
 
Alfredapp produces ERROR: [ERROR: alfred.workflow.input.scriptfilter] XML Parse Error 'The operation couldn’t be completed. (NSXMLParserErrorDomain error 76.)'. Row 1, Col 3225: 'Opening and ending tag mismatch: title line 0 and item' in XML:

 

Ampersands in URLs work just fine for me.

 

I know I can add '&' in args part in xml, but alfred.py changes '&' into '&' to validate xml automatically. I should change alfred.py file to produce '&'. 

 

Also, the code/workflow on GitHub doesn't appear to be the same code you're using: alfred.py is still the old version. 

 

The code is in test-branch 

Edited by jmjeong

Share this post


Link to post
Share on other sites

Your 'test4' bash workflow works perfectly in Alfred for me.

USzFuz7.png

Have you changed your launchd environment to use a different encoding to UTF-8?

It is perfectly correct that alfred.py changes '&' to '&'. That is valid XML. Alfred will change them back again when it parses the XML.

Share this post


Link to post
Share on other sites
Your 'test4' bash workflow works perfectly in Alfred for me.

 

 

Nope. It doesn't work perfectly. There is 418 items in test4 output xml. It displays only 12 items. 

Please see debug console of test4 workflow. 

Edited by jmjeong

Share this post


Link to post
Share on other sites

Right, sorry. It's hard to read the XML, as it's all on one line.

I've tried parsing the XML from 'http://s.jmjeong.com/aa.xml' with a few different Python and Ruby parsers.

Everything works perfectly except Alfred…

So, you're probably right: there's a problem with Alfred and/or how it uses NSXMLParser.

Best take this back to the bug reports forum. (I've had problems with Alfred no accepting valid XML myself, but Andrew insisted I was doing something wrong…)

Share this post


Link to post
Share on other sites

I think the biggest clue is the fact that the logged out XML doesn't start correctly (i.e. you don't see <?xml version="1.0" encoding="utf-8"?><items> at the start), so I'm going to investigate if there is an issue why the output from e.g. the Bash script isn't making it back into Alfred to be parsed.

 

[moving back to bugs for now, I'll report back my findings]

 

deanishe - thanks for your help in trying to debug this issue!

 

Cheers,

Andrew

Share this post


Link to post
Share on other sites

I think the biggest clue is the fact that the logged out XML doesn't start correctly (i.e. you don't see <?xml version="1.0" encoding="utf-8"?><items> at the start), so I'm going to investigate if there is an issue why the output from e.g. the Bash script isn't making it back into Alfred to be parsed.

 

I see the <?xml version "1.0" encoding="utf-8"?><items> at the start line in my debug console. 

 

Screenshot%20of%20Alfred%20Preferences%2

 

 

You can download test4 script from http://cl.ly/0m1r1G14003e/test4.alfredworkflow

Share this post


Link to post
Share on other sites

The XML header is always present in the above test workflows, yet Alfred still throws parse errors.

FWIW, I tried the above-linked XML file with Python's lxml.etree and built-in ElementTree parsers (the latter being the library used to generate the XML in the first place) and Ruby's nokogiri. All worked flawlessly.

Share this post


Link to post
Share on other sites

I believe I've worked out what is going on. Alfred reads back from the script in arbitrary sized blocks of bytes as they are made available from the NSTask. In this case, NSTask is returning them with length 4096. Alfred converts each 4096 block of data into a UTF8 string then appends each to a mutable string. When there is no more data, Alfred passes the output string back to the XML parser.

 

Seeing the original screenshot missing the leading XML and the raw XML provided by jmjeong allowed me to see that some of the 4096 bytes of data were not correctly resolving as a UTF8 string. It looks like the boundaries (leading or trailing character) of some of the 4096 blocks may have been half of a multi part character which was making this string resolve to null.

 

Simplified: The returned XML string was incorrectly being stripped of some data by Alfred.

 

I have just re-written Alfred's task manager to work with a buffer for the data instead of strings, and it looks like the raw XML provided now fully completes. I'm going to test this over the next little while then out another pre-release.

 

Cheers and thanks for your patience with this issue!

 

Andrew

Share this post


Link to post
Share on other sites

×
×
  • Create New...