Jump to content
phyllisstein

alp: A Python Workflow Module

Recommended Posts

I wouldn't describe it as an error per se. Just something that's worth mentioning in the README as a potential cause for a script not working properly with non-ASCII filepaths if they aren't also normalised the same way. If you call os.listdir(u'/unicode/path'), the returned filenames will be NFD-normalised. So it might be a good idea to mention that decode is altering Alfred's input in such a way that filepaths won't necessarily match those your script gets from the filesystem.

 

That's a good point regarding .git. I hadn't thought of that as I build my workflows with a script that ignores .git.

Share this post


Link to post

we were all trying to keep zipped workflows under 1MB, so deleting git's voluminous histories always made more sense, but I don't think that's a concern anymore.

 

It's probably something we should all pay more attention to. It'd explain why some workflows (mine included) take up more disk space than they should...

Share this post


Link to post

This was what always struck me as a bit "off" about Alp—it bundles stuff that is not directly related to Alfred workflows or OS X (basically requests, which in turn bundles urllib3, which is itself a source of complaints re the requests library. The rest of Alp is of trivial size).

 

That said, now I'm wondering whether the screenshots included in my workflows for the help files should be really included locally in the workflow or just left on GitHub and referenced via a remote URL…

 

At any rate, it's certainly sub-optimal if every Python/Ruby/PHP workflow includes 1MB of the same libraries. Especially for folks who sync their workflows via Dropbox.

Share this post


Link to post

Hah, yeah... my first few workflows, and I think a fair number of others at the time, were very Web-centric; and since bundling Requests was actually a bigger chore than I'd expected, and involved a lot of weird bespoke changes to how it imported its own dependencies, I thought it would be more helpful than not. By the time the winds changed, I'd accidentally gotten a day job and left alp to languish; so there Requests stays. Is the story there.

Share this post


Link to post

That said, now I'm wondering whether the screenshots included in my workflows for the help files should be really included locally in the workflow or just left on GitHub and referenced via a remote URL…

 

At any rate, it's certainly sub-optimal if every Python/Ruby/PHP workflow includes 1MB of the same libraries. Especially for folks who sync their workflows via Dropbox.

 

The images aren't too big, but there probably isn't a reason to include them in the workflow itself. I usually sync the workflow folder to another folder that's the git archive, and then I push it to github. I throw the screenshots into that one and also put the copy of the workflow there. Although, now, I seem to just be uploading them to Packal and not worrying about it too much.

Share this post


Link to post

Oh, I meant to mention something about the libraries.

 

I talked to David about this several months ago, and he pointed out that most of the Alfred libraries are fairly small. This definitely is the case with his PHP one and the bash library.

 

Stephen, who wrote the great Zotquery workflow, took a nice approach to the python libraries he needed by creating an installer that would install the libraries natively on the system if they weren't detected. It seems that if more people took that approach for larger libraries, then it would make things smaller overall. The downside is that this approach is quite a bit harder to encode. We worked on a couple of Applescript dialogs that would let people know what was happening and let them opt out. Also, the strange thing, OS X has easy_install on it but not pip.

 

Another good point that David made with the libraries is that some workflows might depend on certain versions of them, using functions that are deprecated later or ones that have their structure changed, which would break the workflow.

 

So, these are just some thoughts. Maybe put screenshots somewhere else. Bundle small libraries. Install larger libraries.

Share this post


Link to post

Indeed, most libraries are small enough to bundle with the workflow.

 

Installing libraries into the system Python is a bad idea. Python has no way to install different versions of the same library in parallel, so sooner or later, something will break when two different pieces of software require two different versions of the same library. Library APIs change relatively often and things break as a result. Requests bundles its own version of urllib3 for the same reason, and this is also why virtualenvs are the de facto standard way of developing/deploying Python applications.

 

You also need root permission to install system libraries (well, at least I do. That might just be because I installed pip as root.)

 

Finally, as a developer, you run the risk of overlooking a dependency because it's installed in your system Python.

 

Personally, I install my own version of Python in /usr/local for all my local scripts, and use the pristine system Python for workflows. Anything that might be deployed elsewhere gets a virtualenv.

 

If a workflow needs some huge libraries, it should probably install them in its data directory in Application Support and add that to sys.path. No huge workflow files, no root permission necessary, and no chance of breaking other software.

Edited by deanishe

Share this post


Link to post

@deanishe, 

 

That workflow for libraries makes a lot of sense. As someone who stumbled into "development", simply having two separate Python installs would have solved *tons* of bug issues. Could you point me in the direction of some how-tos/tutorials on moving my modified Python version to /usr/local and returning my system Python to an original state? Is such a move possible now? Or am I screwed?

 

Also, maybe setting up some explicit step-by-step guide for library inclusion might be worth doing. What I found working with Shawn on the ZotQuery dependencies alone has shown me that there needs to be some clear thought put into dealing with dependencies in workflows. Your points on installing straight to root while configuring a workflow makes much sense. The suggestion of putting dependency libraries in ~/Library/Application Support/Alfred 2/... also makes much sense. Trim the actual workflow, install needed dependencies, put things in predictable locations, but keep things "sandboxed". But maybe an explicit "Best Practices" for Alfred workflow development in Python would be best for standardizing (as much as is possible in a user generated and developed environment such as this) workflow creation and distribution. I would definitely be interested in such a thing, and insofar as I was able, I would be willing to aid as well. I think Alfred is a fantastic platform to build on top of, and I think there is room for some really robust workflows (I'm still pushing features in ZotQuery to get it as close to a replacement for Zotero's GUI as possible, and I'm finding that possible), but with such robustness comes these types of issues, and offering clear, rational, clean advice would be a great boon to the community.

 

Alright, I'm done mind-meandering now. All that to say, we should settle this issue, write it up, and start implementing it ourselves. 

Share this post


Link to post

Installing a separate Python

 

Download and install Python for Mac from the Python homepage. This will install in /Library/Frameworks/Python.framework/Versions/2.7/, with python linked to /usr/local/bin/python (the system Python is under /System). It's probably the best option (rather than, say, using homebrew), as it comes with some nifty Mac-specific packages, so you can use Cocoa and Carbon libraries from Python if you need to.

 

Make sure /usr/local/bin is before /usr/bin in your bashrc (I think Python will set this automatically upon install), but be sure to test workflows using /usr/bin/python.

 

If you write any scripts that will be run by other apps (e.g. Hazel or launchd), make sure to use #!/usr/local/bin/python as the first line of the script or otherwise specify /usr/local/bin/python (rather than just python) if they require any packages you've installed, as the system doesn't source your bashrc and anything not run from bash will use the system Python by default instead.

 

Follow the instructions for installing pip, and be sure to run get-pip.py with /usr/local/bin/python, not the system one!

 

Now anything you install with pip will go into /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages, where the system Python can't see it.

 

Any packages you've installed in the system Python should be in /Library/Python/2.7/site-packages (which your newly-installed Python will also read from). Removing them should return your system Python to a pristine state.

 

Including libraries in workflows

 

Personally, I think keeping any required libraries in the workflow folder, or a lib subfolder (with an __init__.py, so a lib package, technically) if there's a lot of them, is generally the right way to go. Obviously, it's not ideal if every package is bundling its own version of requests, but I think in most cases requests is overkill: it's a lovely package to use, but for the purposes of most workflows, urllib is more than up to the job.

 

A library that could take care of dependencies by installing them in the workflow's data directory under ~/Library/Application Support/Alfred 2… might be useful for saving Dropbox space (and bandwidth for whoever is hosting the workflow), but may lead to other, bigger problems:

  • If I happen to be offline the first time I try to use a workflow that needs libraries installing, it won't work.
  • The workflow is now also dependent on potentially several external websites being online and functioning the first time it's run (and I've seen PyPi down on several occasions).
  • If the library requires code to be compiled on installation and the user doesn't have Xcode/Apple's Command Line Tools installed, installation will fail.
  • Additional complexity means more can go wrong.

I'm working on my own take on a workflow helper library for Python (I want one like the Ruby library that doesn't die silently if there's an error), so I'll definitely give some more thought to this.

 

It'd be good to have some kind of stats on how big Python workflows tend to be/how many people are bundling large libraries etc. to see if it's worthwhile.

 

I think it might be a better use of time to implement a lightweight version of the requests API. A simple, requests-like module, but without all the session and charset detection stuff that isn't of great utility for most workflow purposes. Most of requests' bulk is it's charset detection library, which shouldn't be necessary for a workflow that connects to a single web service.

Edited by deanishe

Share this post


Link to post

So, then, you might consider creating a "standard" shared library system for Alfred workflows. Maybe in the datadirectory /shared_libraries. And, within that /shared_libraries/python2.7 /shared_libraries/php5 /shared_libraries/ruby2.0, etc, etc... Then, there could be a shared "library" script for downloading these dependencies in the correct places.

 

If you're talking about downloading libraries into the workflow folder, then I have to push against this as updating the workflow might get rid of those libraries, making the user re-install the library with each update. Keeping them in the workflow's data directory, however, would be the way to go.

 

I don't think that the problem of being offline the first time a user runs a workflow is high. I'd imagine that most users download the workflow, install it, then try it out. That means they're online because they just downloaded it. I think that's a decent working assumption. There could be something built into the script to check for a live internet connection before it tries to download anything and then sends a standard error to the user.

 

If you're talking about installing full new versions of python, etc.. rather than just libraries, however, you'll be added much more to the user's system, and there should be companion uninstall functions. While disk space is fairly cheap for most users, I do imagine that those who get things like MacBook Airs with SSDs that are integrated into the motherboard find space to be scarce, and so we shouldn't underestimate that.

 

But, the most important thing is that if most workflows don't need large libraries (large here is up for debate), then this entire endeavor is too much work for too little gain. Although, it's a fun idea.

 

I'm also a big fan of the challenge of writing a workflow that uses as few dependencies as possible, none if possible. It's fun.

Share this post


Link to post

I'd suggest to use a fixed path like ~/Library/Application Support/Alfred2/Workflow Data/Python/python to point to the Python interpreter of your choice. This can be as simple as a softlink. And I would recommend to everyone to use a virtualenv install which is standard and lightweight. No need to install another Python, just re-use whatever Python you are using (system, MacPorts, brew, ...).

Share this post


Link to post

@Shawn and @nikipore

 

I mean each workflow could install any libraries it needed under its own data directory. With my MailTo workflow, for example, that would be in a subdirectory of ~/Library/Application Support/Alfred 2/Workflow Data/net.deanishe.alfred-mailto. You'd just need to append/prepend the directory to sys.path.

 

Having libraries shared between workflows would still leave the problem of conflicting library versions. If you've got a workflow that hasn't been updated in 2 years, and one that's brand new, it's quite likely that any libraries they share would have API differences that'd break stuff.

 

Storing anything that matters in the workflow directory is a bad idea for exactly the reason Shawn gives.

 

Using virtualenv is, IMO, overkill. It'd need to be installed in the system Python as root, and a fair amount of faffing around activating the virtualenv in your workflow.

 

With regard to installing versions of Python, we were just talking about installing a version separate from the system default for developers, so you can use that for your own scripts and install whatever libraries you like, while leaving the system version pristine for developing and testing workflows and other code that needs to run on other machines. If you've been installing libraries in your system Python, it's very easy to overlook dependencies that you have come to take for granted (I always forget that docopt isn't a standard module, for example).

Edited by deanishe

Share this post


Link to post

Luckily, this should only be a problem in larger, more robust workflows. I've found in my ZotQuery workflow that even the attempts to install necessary packages when configuring isn't 100% fail-proof. I recently had to add the feedparser and pytz packages to the workflow itself to kill a bug that a user had. One thing I noticed when doing so was that you can greatly reduce the workflow's size if you clean out all of the Python Source Files before sharing. That effectively reduced the size of my "dependencies" folder by half. So while I would like a cleaner, lighter solution that would allow me to utilize these non-standard packages (in my case, it's the fact that I use one highly specific package to interact with the Zotero API, which itself relies on a number of non-standard packages/modules), I think that being conscious of .pyc files can help alleviate some of the heft. 

Share this post


Link to post

Luckily, this should only be a problem in larger, more robust workflows. I've found in my ZotQuery workflow that even the attempts to install necessary packages when configuring isn't 100% fail-proof. I recently had to add the feedparser and pytz packages to the workflow itself to kill a bug that a user had. One thing I noticed when doing so was that you can greatly reduce the workflow's size if you clean out all of the Python Source Files before sharing. That effectively reduced the size of my "dependencies" folder by half. So while I would like a cleaner, lighter solution that would allow me to utilize these non-standard packages (in my case, it's the fact that I use one highly specific package to interact with the Zotero API, which itself relies on a number of non-standard packages/modules), I think that being conscious of .pyc files can help alleviate some of the heft. 

 

Just a minor thing, but the source files are the .py files. .pyc are compiled, bytecode files. Removing the .pyc files before distribution is definitely a good idea.

 

I wrote a script to build workflows that ignores .pyc files and .git. I should probably upload that …

Share this post


Link to post

Thanks for the small correction. I will say tho, on the larger topic, that I at least could use some guidance on dealing with dependencies. Attempting to wrap the packages in the workflow simply didn't work for me. For whatever reason, relative imports constantly failed. My attempts to help users download required packages to their comp are also hit or miss. 

The specific problem comes when you rely on highly specific packages for certain tasks. In my case, I MUST use pyzotero, a Python package for interacting with Zotero's API. This is a complex package and I can't avoid the non-standard dependencies. Still a relative noob in Python, I feel underskilled to provide a bug-free workflow that nonetheless relies upon non-standard packages. I have to imagine that I'm not alone in this position. Any thoughts or ideas on creating a water-tight, user-friendly system for dealing with Python dependent workflows are highly welcome.

Share this post


Link to post

Thanks for the small correction. I will say tho, on the larger topic, that I at least could use some guidance on dealing with dependencies. Attempting to wrap the packages in the workflow simply didn't work for me. For whatever reason, relative imports constantly failed. My attempts to help users download required packages to their comp are also hit or miss. 

The specific problem comes when you rely on highly specific packages for certain tasks. In my case, I MUST use pyzotero, a Python package for interacting with Zotero's API. This is a complex package and I can't avoid the non-standard dependencies. Still a relative noob in Python, I feel underskilled to provide a bug-free workflow that nonetheless relies upon non-standard packages. I have to imagine that I'm not alone in this position. Any thoughts or ideas on creating a water-tight, user-friendly system for dealing with Python dependent workflows are highly welcome.

 

Assuming that you're invoking your Python scripts from within a shell script, you might have better luck changing the PYTHONPATH environment variable than trying to muck around with sys.path. You could either standardize on a common path, as the invaluable @nikipore suggests, or use your personal workflow's storage path, then prepend it to PYTHONPATH before calling Python. Then, from within the script, it'd be a matter of wrapping your import in a try/except block that downloaded, extracted, and re-imported the dependency. Not sure if that's super-helpful, though, or even the best way of going about it.

 

Edit: Or better yet, do all the checking and downloading from within your shell script, then just write the Python script assuming the module is already there. That's almost guaranteed to be simpler.

Edited by phyllisstein

Share this post


Link to post

As a stop-gap, I rewrote my script that checks for necessary modules to make use of new bash scripting I've just learned (having researched to understand the details of phyllisstein's reply). It still installs necessary, lacking modules to the user's PATH, but it is now fully contained within AS (previously it was a Python script that occasionally invoked Applescripts that themselves invoked shell scipts...yeah, I know). Since a vast majority of the Python modules my workflow uses are standard, and the few that are likely to be missing are still fairly basic, I don't think straight installing them is all that bad as a solution until some cleaner, easier, more water-tight solution presents itself. 

Share this post


Link to post

Thanks for the small correction. I will say tho, on the larger topic, that I at least could use some guidance on dealing with dependencies. Attempting to wrap the packages in the workflow simply didn't work for me. For whatever reason, relative imports constantly failed. My attempts to help users download required packages to their comp are also hit or miss. 

The specific problem comes when you rely on highly specific packages for certain tasks. In my case, I MUST use pyzotero, a Python package for interacting with Zotero's API. This is a complex package and I can't avoid the non-standard dependencies. Still a relative noob in Python, I feel underskilled to provide a bug-free workflow that nonetheless relies upon non-standard packages. I have to imagine that I'm not alone in this position. Any thoughts or ideas on creating a water-tight, user-friendly system for dealing with Python dependent workflows are highly welcome.

 

I can't give an authoritative answer on the subject because it's complex, nor address any specific problems, as none are given.

 

I suspect one reason may be that many packages are not meant to be used "as-is", but installed via pip/setup.py.

 

I've had a poke around ZotQuery, and I suspect using pip install --target=dependencies pyzotero would have fixed the import problems. That will install pyzotero and its dependencies (pytz, requests, feedparser) in the dependencies directory. Seeing as pyzotero will want to import those libraries, you'll have to add dependencies to sys.path either directly before importing pyzotero or using PYTHONPATH.

 

Fundamentally, there isn't a "watertight, user-friendly system" for dealing with Python dependencies, as there's no requirement for modules/packages to be packaged and distributed in a certain way. The above-described pip install --target method should get you there 90% of the time with PyPi packages, however. The problems you're having, I suspect, stem mostly from your relative unfamiliarity with Python's way of doing things and the somewhat unusual practice of bundling packages.

 

Helping users install packages into their system Python is a recipe for trouble: what if another workflow (or any other Python software) requires a newer/older version of the same library with a different API? What if you alter your workflow to use a newer version? Then you have to debug a problem with your user's system Python, potentially breaking their other stuff in the process of fixing yours.

 

Keeping everything self-contained within the workflow's private directories is a better idea. Much less potential for things to go wrong.

 

Assuming that you're invoking your Python scripts from within a shell script, you might have better luck changing the PYTHONPATH environment variable than trying to muck around with sys.path. You could either standardize on a common path, as the invaluable @nikipore suggests, or use your personal workflow's storage path, then prepend it to PYTHONPATH before calling Python. Then, from within the script, it'd be a matter of wrapping your import in a try/except block that downloaded, extracted, and re-imported the dependency. Not sure if that's super-helpful, though, or even the best way of going about it.

 

Edit: Or better yet, do all the checking and downloading from within your shell script, then just write the Python script assuming the module is already there. That's almost guaranteed to be simpler.

 

Using PYTHONPATH is cleaner than messing with sys.path in simple cases, but with workflows that have a lot of entry points from Alfred  (smarg19's ZotQuery has over a dozen), there's much greater scope for mucking things up simply through the duplication of effort required. It would also make it harder for your users to debug a workflow in Terminal, as they'd have to start faffing with environmental variables. Not a big deal, but a lot of people are uncomfortable enough with shells.

 

Any workflow of any size is going to have core functionality in a shared module (e.g. config.py or workflow.py or mypackage etc.) that could take care of amending sys.path just once in a central location.

 

My attitude to workflows is to make the script file(s) as self-contained as possible, leaving the absolute minimum of code in Alfred's Script field. This makes the inevitable debugging a lot easier, especially when assisting less technically-savvy users.

Share this post


Link to post

I've pip installed pyzotero directly into my workflow folder, and all the dependencies came with it. Here's the problem: all the dependencies make these relative imports, and these all break when everything is in the workflow folder. I've started down the path of manually changing them, but that's a fool's errand. 

 

How can I put everything in one place, but have them all import each other smoothly? It's driving me crazy...

Share this post


Link to post

You'll have to be more specific. Where exactly are you installing them, and what are the error messages?

 

It might be best if you uploaded the broken workflow somewhere, so I can have a proper poke around in it.

Share this post


Link to post

Sorry for the delay, I've put the most up-to-date version on my GitHub page. 

 

Specifically, the problem is pyzotero importing its non-standard dependencies. If the modules are installed on my machine, everything works well, but if I use a standard Python dist., the imports all break down. There's no specific example, since it's the whole complex of pyzotero <-> pytz <-> requests <-> feedparser...

Share this post


Link to post

Never mind, I figured it out.

 

Like I said earlier, if you're going to install the dependencies in a subfolder, you will have to add that folder to sys.path at the top of every single Python script that imports one of the dependencies.

 

Just install all the dependencies in the top level workflow directory alongside alp and all your Python scripts. Then change all of your imports from from dependencies.pyzotero import requests to import requests.

 

pyzotero can't find its dependencies because it's trying import requests, but you've installed it at dependencies.pyzotero.requests.

 

It's a good idea to have a lib/packages/dependencies subdirectory to keep external dependencies in, but the way you've structured your workflow (using loads of individual scripts) makes that more work than it's worth.

Edited by deanishe

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...