Jump to content

Alfred Dependency Downloader Framework


Recommended Posts

Installations would be to ~/Application Support/Alfred 2/Workflow Data/alfred-bunder/bundle-id-of-package. (It's so simple, there's little reason to create a new install dir if/when bunder is updated.)

It definitely makes sense to look for/expect requirements.txt to be in the same directory as `info.plist, i.e. the workflow root.

 

Understood on the import check logic (I think I remember you mentioning this before, but I just wanted to re-air it to be sure: things have been hectic for me).

 

For better organization, I'd prefer 

...../alfred-bundler/workflows/bundle-id

that way we can make sure that the base bundler directory doesn't have way too many folders in it.

 

I do think that we should support requirements.txt in either ./ or ../ just to allow people to place both in a subdirectory if they'd like. Looking for it in two places isn't a big deal and adds a small, but nice, layer of flexibility.

 

I agree on the speed of execution. The bundler needs to make sure that it doesn't have a noticeable impact on performance. When downloading new assets, that impact is impossible to avoid, but it is just a one time thing. But, after that, everything should function as quickly as possible, but we do need to make sure that the checks are in place to make sure that it will have everything necessary to run. Hence, when we would call PIP based on the non-existence of files, for each pip call, we should first check that the package might exist, if that is possible. Doing so would limit the number of calls needed, which is, obviously, preferable.

 

Oh, Python is such a wonderful language with the exception of the blight that is dependency management.

Link to comment

Python dependency management is great! Just use pip. Sorted :)

pip does all the checking for you: if the package is installed in the appropriate version, it does nothing. So calling pip only installs what's missing/needs updating. You only need to call it once, too, with the requirements.txt file, not for every package.

The question, as I see it, is what to do when requirements.txt has changed. We can either delete all the packages before running pip, thus ensuring a clean library, or just run pip on the changed requirements.txt, which would be much faster (fewer downloads), but may end up leaving a lot of old cruft in the workflow's library directory.

Link to comment
Ok. I have an initial version up as a single file Gist

 

As it stands, it will simply runs pip with the requirements.txt file if the bundler version doesn't exist (first run) or if it has changed. Dean's question is well worth thinking over. I don't have a great answer yet. 

 

Please read through the code and give me some feedback. I did it mostly in a sit-down session today, so there are certainly some errors in it. 

 

stephen

Link to comment

Haven't tried running it yet (it's late, I'm drunk), but the code looks well-written and on point.

I had a hard time following exactly what you're trying to do, however (I am drunk). Good work on the comments, but it's a good idea to also document the intent of the code. For example, what is the purpose of specifying the local target to get_requirements vs the alternative of looking under BUNDLER_PATH for requirements.txt? Why is requirements.txt in the bundler's directory? I've figured this out now, having pored over the code, but it'd be great if your comments specified why, not just what.
 
You shouldn't be copying requirements.txt to the bundler directory and comparing its age to that of the one in the workflow directory to determine changes: I might have rewritten/reinstalled the workflow without changing the contents of requirements.txt, but the mod time will have changed. You need to generate the MD5 hash of requirements.txt and store that in the bundler directory to determine whether it's changed. Best of all, store the hash and the modtime and use a changed modtime as a trigger to perform a hash comparison to determine whether the contents of requirements.txt have changed (see below for code).

WRT pip, rather than messing about looking for the pip module and installing it from a ZIP file, you should probably just install the get-pip.py script, which is a self-contained version that works exactly the same as an installed pip. We should include get-pip.py as a general bundler utility (like CocoaDialog etc., as there's no point duplicating it in every workflow.

Massive kudos on considering the presence of a proxy server, but it is, alas, to no avail and should be deleted. pip will automatically use the proxy server specified in the http_proxy environmental variable in any case (it's a standard feature of the Python libraries it uses). The --proxy argument is essentially a way to override http_proxy or specify one if http_proxy isn't set.

Unfortunately, it is almost never set in Alfred's calling environment: it doesn't use your shell environment, but that of launchd, which doesn't include all the cool stuff you've set in your ~/.bashrc, ~/.profile etc., so http_proxy will basically never be set except for the rare user that fiddles with launchd's environment.

This is something I've been considering reporting as a feature request/bug with Alfred (it's extremely difficult to determine the proxy settings from a script, as it's dependent on the current network profile, but Cocoa apps can do it easily. Alfred should really set http_proxy before it calls scripts.)

I'll try to have a proper look at the code over the weekend: I'm very busy until then :(

----------------------------------------

To hash a file, do: 

from hashlib import md5

h = md5()

with open('requirements.txt', 'rb') as file:
    h.update(file.read())
    hash = h.hexdigest() # store this value
Link to comment

Unfortunately, it is almost never set in Alfred's calling environment: it doesn't use your shell environment, but that of launchd, which doesn't include all the cool stuff you've set in your ~/.bashrc, ~/.profile etc., so http_proxy will basically never be set except for the rare user that fiddles with launchd's environment.

 

I haven't tried this yet, but I wonder if we just add the line

source ~/.bashrc 

to the scripts if it will load the bash profile...

Link to comment

Sure, that would work from bash scripts (no help if you use zsh, for example), but that's still assuming that folks have set http_proxy in their bash environment, which I'd say is probably rather unusual.

Proxies are normally set via OS X's Network preference pane, and that does not propagate the setting to the shell (unlike Linux). It's also profile-specific (so you can use a proxy at work, but not at home, for example).

I've looked into it (briefly), and trying to figure out the active network profile and its proxy server is a real PITA. This really is something Alfred needs to do. It applies to every workflow, and is much easier for Alfred to figure out via Cocoa than for us script-bound plebs.

I've just added it as a feature request.

Link to comment

what is the purpose of specifying the local target to get_requirements vs the alternative of looking under BUNDLER_PATH for requirements.txt? Why is requirements.txt in the bundler's directory? I've figured this out now, having pored over the code, but it'd be great if your comments specified why, not just what.

That was mostly to keep the number of functions down. I could easily split that into two separate functions, tho as you say below, having two copies of the requirements.txt file isn't necessary. So when I change that, I can alter that function as well. Tho the point about documenting intent is well taken.

You shouldn't be copying requirements.txt to the bundler directory and comparing its age to that of the one in the workflow directory to determine changes: I might have rewritten/reinstalled the workflow without changing the contents of requirements.txt, but the mod time will have changed. You need to generate the MD5 hash of requirements.txt and store that in the bundler directory to determine whether it's changed. Best of all, store the hash and the modtime and use a changed modtime as a trigger to perform a hash comparison to determine whether the contents of requirements.txt have changed (see below for code).

Good point. Will change accordingly.

WRT pip, rather than messing about looking for the pip module and installing it from a ZIP file, you should probably just install the get-pip.py script, which is a self-contained version that works exactly the same as an installed pip. We should include get-pip.py as a general bundler utility (like CocoaDialog etc., as there's no point duplicating it in every workflow.

I actually was using get-pip.py in the first version, but I was using it to download pip to the user's PATH, which, as you said, was far from ideal. I didn't even think that we could use it to execute the code to install the requirements (in the same way that that the script uses pip to install pip. That is actually quite brilliant. I will alter code accordingly. Great thought!

Massive kudos on considering the presence of a proxy server, but it is, alas, to no avail and should be deleted. pip will automatically use the proxy server specified in the http_proxy environmental variable in any case (it's a standard feature of the Python libraries it uses). The --proxy argument is essentially a way to override http_proxy or specify one if http_proxy isn't set.

Will do; taking it out.

To hash a file, do: 

from hashlib import md5

h = md5()

with open('requirements.txt', 'rb') as file:
    h.update(file.read())
    hash = h.hexdigest() # store this value

Thanks. Needed this. I'll fiddle with this all tomorrow

Link to comment

Ok. I've updated the script. View the Gist here.

Changes:

  • No longer copy requirements.txt to workflow's bundler directory.
  • Create info.txt file in workflow's bundler directory with JSON dict of requirements.txt's hash and mod time.
  • Only runs pip if info.txt doesn't exist (first run), or hash is different.
  • Only checks if hash is different if mod time is different.
  • Downloads get-pip.py from GitHub is user doesn't have pip installed.
  • Imports pip from zip file created in temporary directory, which is deleted after running (this mirrors get-pip.py's functionality, though it never actually runs that script, which would install pip on the user's system).
  • Actually modify sys.path upon completion.
  • General code cleanup
For ZotQuery, this code take ~9 seconds to run on the first go. It installs 6 modules/packages totalling 3.7 MB in that time as well as get-pip.py (so that is as cold as it gets). After that, I could import modules from that directory in 0.3 seconds. Here's the syntax:

import bundler
bundler.init()
import pytz
(etc...)
Look it over and offer more corrections. But as it stands, this is functioning for me (woo-hoo!).

stephen

Edited by smarg19
Link to comment

Again, I haven't run the code yet, but a couple of things stand out. Why are you creating empty files here and there? Directories I understand, but what's the purpose of creating an empty get_pip.py file?

If something goes wrong between touching that file and saving pip into it, the bundler will believe that pip is installed when it isn't.

Is it necessary to extract pip's embedded zip? Can't you just run the downloaded get-pip.py with subprocess:

 

subprocess.call(['/usr/bin/python', '/path/to/get-pip.py', '--target', '/path/to/lib/dir', '-r', '/path/to/requirements.txt'])
Also, you're saving the hash and modtime of requirements.txt before it's been installed. That's bad because if the install fails (e.g. the computer or PyPi is offline), on the next run the bundler will think everything has been successfully installed due to the presence of the info.txt file. (Shouldn't that be called info.json?)

You need to check whether pip ran successfully and only then save the metadata. What to do if pip fails is another thorny question. Delete everything and start again or just try to run it again?

Edited by deanishe
Link to comment

Why are you creating empty files here and there? Directories I understand, but what's the purpose of creating an empty get_pip.py file?

Good point. Will change. I was doing it for parallelism I guess.

Is it necessary to extract pip's embedded zip? Can't you just run the downloaded get-pip.py with subprocess:

 

subprocess.call(['/usr/bin/python', '/path/to/get-pip.py', '--target', '/path/to/lib/dir', '-r', '/path/to/requirements.txt'])

You could do this, but it will install pip to the user's system before it then installs all the dependencies. I got the feeling that this was frowned upon. If it isn't, that would definitely make things a bit simpler.

Also, you're saving the hash and modtime of requirements.txt before it's been installed. That's bad because if the install fails (e.g. the computer or PyPi is offline), on the next run the bundler will think everything has been successfully installed due to the presence of the info.txt file. (Shouldn't that be called info.json?)

Good point. Will change.

Link to comment

You could do this, but it will install pip to the user's system before it then installs all the dependencies. I got the feeling that this was frowned upon. If it isn't, that would definitely make things a bit simpler.

You're right. It first installs itself, but in the directory you specify, not the system Python.

IMO, the pip install should be taken care of by the bundler library, as with other utilities. Of course, then your cool script will no longer run without bundler.

Edited by deanishe
Link to comment

Also, I'd change a few of the variable names to make their purpose clearer, e.g. target_path -> start_dir, and provide a clearer error when you raise BundlerError. Something like "Couldn't find Bundler directory. Is it installed? Get it from http://..."

 

Actually, no error should be thrown because the bundler, at its core, is supposed to be a fallback to prevent errors. So, if the bundler isn't install, it isn't an error, but it should invoke an action to install the bundler and then, hopefully, resume.

Link to comment

Yup. Of course, we'd want to write the Python wrapper around it first…

I don't think any bash code should be necessary. If it is, I'm out! I hate bash :D

 

Oh, how you do.

 

But, I've already provided a 'misc' wrapper that you can wrap python around. The wrapper you'd want to use is "alfred.bundler.misc.sh" which can be found in 

$HOME/Library/Application Support/Alfred 2/Workflow Data/alfred.bundler-aries/wrappers/alfred.bundler.misc.sh

It's basically there for other languages to have an easy way to load utilities. It takes arguments the same way the PHP and Bash versions do, so if you call

..../alfred.bundler.misc.sh "Terminal Notifier" "default" "utility"

Then you'll get the full path back to Terminal Notifier (if installed / otherwise, it gets installed, and then you get the path afterwards), and just save the output as a variable and use it in a system call with the arguments that TN needs.

 

A fourth argument can be passed to the script, and that argument should be a path to a valid JSON file (that falls into the bundler's JSON format), and it will use that information for that asset. So, it can be just about any utility that you want there.

 

The script also installs the bundler if not available. Here's the relevant code that should be ported:

# This just downloads the install script and starts it up.
function __installBundler {
  local installer="https://raw.githubusercontent.com/shawnrice/alfred-bundler/$bundler_version/meta/installer.sh"
  dir "$__cache"
  dir "$__cache/installer"
  dir "$__data"
  curl -sL "$installer" > "$__cache/installer/installer.sh"
  sh "$__cache/installer/installer.sh"
}

# Just a helper function to make a directory if it doesn't exist.
function dir {
 if [ ! -d "$1" ]; then
  mkdir "$1"
 fi
}

if [ ! -f "$__data/bundler.sh" ]; then
 __installBundler
fi

# Include the bundler.
. "$__data/bundler.sh"

So, the "dir" command is just a custom function that checks to see if a directory exists, and, on failure, makes it. As you can see, the first call in the script (for Bash, functions need to be written before) just checks to see if 'bundler.sh' is installed in the bundler data directory. If it isn't, it downloads and installs the bundler with these two lines:

  curl -sL "$installer" > "$__cache/installer/installer.sh"
  sh "$__cache/installer/installer.sh"

So, it just downloads the install script from 

https://raw.githubusercontent.com/shawnrice/alfred-bundler/$bundler_version/meta/installer.sh

 into the bundler's cache directory, and then it runs it. That's it.

 

There should probably be more sophisticated checks for the bundler's installation, but this is good enough for government work (read: initial release). But, the key is that the bundler should install itself only through this method: download the install script to the cache directory and run the script.

 

Now, all of the wrappers are downloaded into the bundler directory under "wrappers," so the python wrapper could just use this wrapper to load utilities, constructing a call to it via the above syntax.

 

So, Dean, good news: all the bash is written. You don't need to touch any of it! Just use it.

Edited by Shawn Rice
Link to comment

As an after thought: here are two good ways to think about what the bundler should be:

 

(1) It is a wrapper to load utilities that automatically handles errors by downloading / installing missing assets as they are called (lazyloading?)

(2) It is a wrapper built around wrappers and wrappers. So, as much of the code that is there can be and should be reused (read: used as a wrapper).

 

Right now, the fundamental flaw in (1) is that it doesn't have an elegant way to notify the workflow if it cannot download an asset (especially if you try to use it on a first run without an internet connection).

Link to comment

Actually, no error should be thrown because the bundler, at its core, is supposed to be a fallback to prevent errors. So, if the bundler isn't install, it isn't an error, but it should invoke an action to install the bundler and then, hopefully, resume.

 

What happens if there's no 'net connection or the server isn't responding?

 

So, Dean, good news: all the bash is written. You don't need to touch any of it! Just use it.

 

:P  I know. Just joshing.

Link to comment

Right now, the fundamental flaw in (1) is that it doesn't have an elegant way to notify the workflow if it cannot download an asset (especially if you try to use it on a first run without an internet connection).

 

I guess that answers my question.

 

Could use exit codes to do that in the bash version. -1 = unspecified error, -2 = network error, -3 = user (i.e. input) error.

 

Languages wrapping that could turn the exit codes into proper errors/exceptions. What's the standard PHP way to catch/handle errors?

Link to comment

What happens if there's no 'net connection or the server isn't responding?

 

Assumption #1: Github will always respond (not a bad assumption).

Assumption #2: The workflow author will take care of this.

 

For #2, I ran into that problem a bit with Alfred Cron. I still haven't figured out a good way to do this across all languages.

 

Basically, for PHP, I think I'd need to do something like:

$connection = @fsockopen("github.com", 80, $errno, $errstr, 30);

if ( ! $connection ) { 
    // Create some standard Alfred Script Filter XML to notify of the error, apologize,
    // tell them to connect to the damn internet, and let them know that, until they do,
    // they're fucked.
    die();
}

fclose( $connection );

// Continue with script

For PHP, there are also TRY/CATCH statements. But, sending errors to Alfred is problematic because it just makes the Workflow crash rather than sending the user information about what happened.

Link to comment

You're right. It first installs itself, but in the directory you specify, not the system Python.

IMO, the pip install should be taken care of by the bundler library, as with other utilities. Of course, then your cool script will no longer run without bundler.

 

I think what this means is that you need to include a JSON file with the following contents:

{
    "name": "Pip",
    "type": "utility",
    "versions": {
        "default": {
            "invoke": "get-pip.py",
            "files": [
                {
                    "url": "https://raw.githubusercontent.com/pypa/pip/master/contrib/get-pip.py",
                    "method": "direct"
                }
            ],
        }
    }
}

And, then, you need to use Python to construct the following command:

sh "$HOME/Library/Application Support/Alfred 2/Workflow Data/alfred.bundler-aries/wrappers/alfred.bundler.misc.sh" "Pip" "default" "utility"

Save the output as the path that you'll need to invoke Pip.

 

You could also do that as a fallback so that you see if the Pip file exists in the proper place, and, if not, then you do that call, which will download it if it isn't there.

Link to comment

Assumption #1: Github will always respond (not a bad assumption).

Assumption #2: The workflow author will take care of this.

 

For #2, I ran into that problem a bit with Alfred Cron. I still haven't figured out a good way to do this across all languages.

 

Basically, for PHP, I think I'd need to do something like:

$connection = @fsockopen("github.com", 80, $errno, $errstr, 30);

if ( ! $connection ) { 
    // Create some standard Alfred Script Filter XML to notify of the error, apologize,
    // tell them to connect to the damn internet, and let them know that, until they do,
    // they're fucked.
    die();
}

fclose( $connection );

// Continue with script

For PHP, there are also TRY/CATCH statements. But, sending errors to Alfred is problematic because it just makes the Workflow crash rather than sending the user information about what happened.

 

I'm not sure it's a great idea to leave it up to the author. That makes the framework something else to worry about as much as a useful tool. Best to specify that errors may occur and what the bundler will do in that case (throw a PHP error they should try to catch, populate an optional $error variable).

 

What I did in my Python framework (pinched the idea from a Ruby one) is to provide a wrapper function to call the workflow code. The wrapper catches any errors and displays an error in Alfred. The bundler could similarly "hijack" execution, display an error and exit.

Edited by deanishe
Link to comment

I think what this means is that you need to include a JSON file with the following contents:

{
    "name": "Pip",
    "type": "utility",
    "versions": {
        "default": {
            "invoke": "get-pip.py",
            "files": [
                {
                    "url": "https://raw.githubusercontent.com/pypa/pip/master/contrib/get-pip.py",
                    "method": "direct"
                }
            ],
        }
    }
}
And, then, you need to use Python to construct the following command:

sh "$HOME/Library/Application Support/Alfred 2/Workflow Data/alfred.bundler-aries/wrappers/alfred.bundler.misc.sh" "Pip" "default" "utility"
Save the output as the path that you'll need to invoke Pip.

 

You could also do that as a fallback so that you see if the Pip file exists in the proper place, and, if not, then you do that call, which will download it if it isn't there.

The get-pip.py file needs to be extracted/installed and a simple wrapper script (the pip executable) created (or maybe not—I think you can import pip and use it that way). Probably the cleanest solution is to point to a pip-installer wrapper instead of get-pip.py itself.

I'll look into it at the weekend if Stephen doesn't beat me to it.

Edited by deanishe
Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...