Florian Posted December 18, 2014 Share Posted December 18, 2014 Very interesting. Do you have some convenient way for users of your workflow to spin up a background worker? Or is this applicable to your own, non-shared workflows? You can observe this with my tv shows workflow (it also used to be in my piratebay workflow but we all know what happened there ...). For both of these, see my signature. Link to comment
rice.shawn Posted December 18, 2014 Share Posted December 18, 2014 I'm not following this. If like Aaron B. you have a query that takes 2-3 seconds to return, how does using a background process help? As soon as your Script Filter exits, it no longer has a way to return results to Alfred, so if it hands off to a background process and exits, how are the results returned to Alfred? Or is this only for non-Script Filter applications? How would a Script Filter process kill its "siblings"? Alfred will only run one at a time, so there can't be any siblings unless Alfred isn't working properly. Depending on the type and the interaction it can either rely on "stock" or cached results while it freshens the cache in the background (but the filter is always a cache behind), or it can call it with an external trigger, when the search is done. I was actually thinking of a model along the lines of the latter that I was trying to do when I was writing a workflow to search for streams, and the APIs were infuriatingly slow. So, it basically offered cached data but was refreshed when the final query came in, and so those were the sibling processes that I killed. It never quite worked the way I wanted it to do however. ---- For the background server, I'm trying to bake this into a library that I'm writing for PHP. Building off the initial idea that Florian and I worked out, see below... These aren't finished, but if you look in this part of the repo, you'll see a few files. -- 1. server.sh -- 2. kill.sh -- 3. server.php The idea is that for your workflow, you'd just have the contents of the box be something like: bash server.sh "path/to/my_main_script.php" "{query}" So, then the server script will look for a php cli-server, and, if it hasn't found one, then it launches it along with a kill script (kill.sh) that will shut down the server after a period of inactivity. If the minimum key strokes have been found, then the query is passed through to the script. Also, the php script should be without needing to be run via the server, but accessing things via a server vs the cli have different environmental variables and places where the query comes in. So, if you just include the `server.php` script in the beginning of the file, then it'll make sure that it sets the variables for you. The code isn't fully functional, but it's an idea, and it works best for the use case when you need to make things snappier, but it doesn't offload the queries. More to come on that later. The workarounds are fun. Link to comment
bachya Posted December 19, 2014 Share Posted December 19, 2014 I see what you're doing there. Really neat idea. Link to comment
deanishe Posted December 19, 2014 Share Posted December 19, 2014 Calling back into Alfred an indeterminate time later from a background script sounds like a risky thing to do. It'd be very annoying if a user is doing something else when the script returns. I don't really follow what you're trying to do with the server. As far as I can tell, instead of running the script directly, it just asks a daemon process to launch it instead. Have I understood that right? FWIW, I usually go with caching. All queries get cached for a few seconds and, where possible, I'll cache the entire dataset (e.g. all contacts or all bookmarks) using a background script. The workflow is very often working with "stale" data from the cache, but this usually isn't a problem in practice. Typically, the cache will also be refreshed before you've finished using the workflow. Link to comment
wolph Posted December 19, 2014 Share Posted December 19, 2014 Calling back into Alfred an indeterminate time later from a background script sounds like a risky thing to do. It'd be very annoying if a user is doing something else when the script returns. I don't really follow what you're trying to do with the server. As far as I can tell, instead of running the script directly, it just asks a daemon process to launch it instead. Have I understood that right? FWIW, I usually go with caching. All queries get cached for a few seconds and, where possible, I'll cache the entire dataset (e.g. all contacts or all bookmarks) using a background script. The workflow is very often working with "stale" data from the cache, but this usually isn't a problem in practice. Typically, the cache will also be refreshed before you've finished using the workflow. That's something you shouldn't do... the flow I would imagine: With every character that comes through the scriptfilter the script checks if a daemon is running and if not, starts the daemon. If the daemon is running, send a new job to the daemon and return the last result from the daemon (which is probably outdated). In the meanwhile, the daemon always processes the last request it gets and drops all other requests. Once it's done processing, wait for a set period (e.g. 10 seconds) and if no new request comes, exit. If someone is really interested in using this I can write up a python demo. Link to comment
deanishe Posted December 19, 2014 Share Posted December 19, 2014 (edited) If the daemon always returns the set of results from the last successfully executed query, how is your workflow ever going to get the final set of results for the final query? The user has finished entering his/her query, but your daemon has already returned an old set of results, and there's no way for it to return the set of results the user actually wants because it won't get called again if the user has stopped typing. In addition, if your daemon kills all existing connections when a new query comes in, there's no guarantee that it will have any old results to return (from a previously successful API call), and if it does, they may well be 5 or 6 iterations old, so you end up with the same situation as now: you're looking at the results for the first two characters of your query while typing the eighth character. You also need to worry about rate limiting, so you're not hammering an API (which may get you banned). Alfred's execution model works as a kind of ghetto rate limiter: your script won't hit the API again till the current request is done. The slower the API, the longer the pauses between requests. If you have a daemon initiating and killing connections on every keypress, you'll end up making several API requests/second. As a rule, the slower an API is, the less amenable it is to being hammered with requests to "speed it up". Edited December 19, 2014 by deanishe Link to comment
Florian Posted December 19, 2014 Share Posted December 19, 2014 (edited) ghetto rate limiter Bottom line is there aren't any pretty solution. Just a bunch of hacks. This needs to be an official feature. Edited December 19, 2014 by Florian Link to comment
wolph Posted December 19, 2014 Share Posted December 19, 2014 If the daemon always returns the set of results from the last successfully executed query, how is your workflow ever going to get the final set of results for the final query? The user has finished entering his/her query, but your daemon has already returned an old set of results, and there's no way for it to return the set of results the user actually wants because it won't get called again if the user has stopped typing. I never said it's without flaws. If the results are indeed too slow to return within a set period of time (whatever time you want to wait in your scriptfilter), it won't return. This can easily be solved by sending the result to a different command in case the user wants to wait longer. The standard way Alfred works is like this: 1. you start typing 2. alfred starts processing the first input 3. alfred waits until your script is finally done processing and it will simply display no results until it's done 4. after the first scriptfilter is finished processing, a new process launches with the current input and the input from the first one is displayed 5. repeat 4 until the user is done typing and the results for the last one are displayed Using the system as I proposed has the flaw that if it's too slow it will never display the results. But it does have the advantage that you can ignore many subsequent requests and implement some rate limiting as to not kill the API. In addition, if your daemon kills all existing connections when a new query comes in, there's no guarantee that it will have any old results to return (from a previously successful API call), and if it does, they may well be 5 or 6 iterations old, so you end up with the same situation as now: you're looking at the results for the first two characters of your query while typing the eighth character. Guess I should have explained it differently, I'm proposing to process everything from a queue. Not kill the existing connections. The situation is not the same as it is right now though. A few advantages: - You can always return results in a timely matter, regardless of how fast the API is. - You can implement rate limiting for the API without losing too much responsiveness. - You can give users the choice to wait for an long time if they want to. You also need to worry about rate limiting, so you're not hammering an API (which may get you banned). Alfred's execution model works as a kind of ghetto rate limiter: your script won't hit the API again till the current request is done. The slower the API, the longer the pauses between requests. If you have a daemon initiating and killing connections on every keypress, you'll end up making several API requests/second. As a rule, the slower an API is, the less amenable it is to being hammered with requests to "speed it up". I never proposed concurrent requests or killing requests, that would indeed be bad Link to comment
deanishe Posted December 19, 2014 Share Posted December 19, 2014 (edited) Guess I should have explained it differently, I'm proposing to process everything from a queue. Not kill the existing connections. The situation is not the same as it is right now though. A few advantages: - You can always return results in a timely matter, regardless of how fast the API is. - You can implement rate limiting for the API without losing too much responsiveness. - You can give users the choice to wait for an long time if they want to. I'm not following what you're proposing at all. How can you always return timely results if you haven't fetched any yet? If you've got a bunch of queries backed up in a queue and only process the newest one, how is that different to what Alfred currently does? I never proposed concurrent requests or killing requests, that would indeed be bad You didn't, but Shawn's post you were expanding on did talk about killing other processes. Also, your use of the term "requests" in "drop all other requests" led me to believe you were talking about HTTP requests. Edited December 19, 2014 by deanishe Link to comment
smarg19 Posted December 19, 2014 Share Posted December 19, 2014 My two cents: this discussion suggests the difficulty of a solution that solves every problem with great stability. Andrew's current implementation has the advantages of being immediately understood and thoroughly consistent. I, for one, have never gotten responses about slow response times, and in the workflows where I need it, all I'm doing is forcing a minimum number of characters before filtering actually begins (so, for example, ZotQuery won't start until it gets at least 3 characters) and caching results. For slow HTTP APIs (right now, only LibGen has this issue for me) I implement the period hack. In either case, it's an easy situation to code and to explain to users. It works the same every time. And it isn't a convoluted solution to an edge-case problem. The ultimate problem is that you can't write code to figure out when you've gotten a meaningful amount of input. Conceptually, you have only two options: [1] accept all input, or [2] force the user to tell you when they've entered the meaningful input. If you want to go route 1, all of these server solutions seem like slaying a dragon to kill a fly. Link to comment
rice.shawn Posted December 19, 2014 Share Posted December 19, 2014 When it comes to re-initializing Alfred after a "slow query": I've never published a workflow with this model because I've never gotten it to work the way that I wanted. I'm not 100% convinced that it's a dead end for all use cases, but it's not something that I've put much work into recently. I have played around with putting locking mechanisms in there too, but that hasn't worked in any particular way that I want either. Well, "lock" isn't really the right word. Basically, you write some identifier to a file in a canonical location, and, when it comes time to send output, you display the results only if the running process matches the information in the file. So, that means that the last query typed would be the only one that ever returned results for the user. But, still, this didn't work out the way that I wanted it to. For the servers: they are, in some ways, overkill, but they're nice to process the data in the background. The way that I like to use them, however, is to make the workflow feel faster. Basically, for a lot of PHP workflows, even local ones, the workflow is slowed down mostly when loading PHP into memory. I mean, the script might run quickly once PHP has loaded, but it does take enough microseconds that there is a noticeable lag when using it. Starting a server means that the PHP binary has already been read into memory, thus removing that load-time for subsequent commands. That doesn't have as much to do with slow requests, but it does have to do with performance. Link to comment
deanishe Posted December 19, 2014 Share Posted December 19, 2014 Yeah, PHP does seem to load more slowly than other languages. I guess that's because it's so monolithic. I can understand wanting a background server process for that reason. Link to comment
Florian Posted December 19, 2014 Share Posted December 19, 2014 (edited) Well, "lock" isn't really the right word. Basically, you write some identifier to a file in a canonical location, and, when it comes time to send output, you display the results only if the running process matches the information in the file. So, that means that the last query typed would be the only one that ever returned results for the user. But, still, this didn't work out the way that I wanted it to. I tried that too at some point but it's useless because alfred doesn't execute all queries in parallel: it waits for one to return before launching the next one. Edited December 19, 2014 by Florian Link to comment
rice.shawn Posted December 19, 2014 Share Posted December 19, 2014 Yeah, PHP does seem to load more slowly than other languages. I guess that's because it's so monolithic. I can understand wanting a background server process for that reason. The `du` command doesn't give the full picture, but it's at least indicative. time python -c 'print "hello"' real 0m0.032s user 0m0.018s sys 0m0.010s time ruby -e 'print "test"' real 0m0.045s user 0m0.033s sys 0m0.010s time php -r 'print "test";' real 0m0.097s user 0m0.085s sys 0m0.011s du -h /usr/bin/python 16K /usr/bin/python du -h /usr/bin/ruby 8.0K /usr/bin/ruby du -h /usr/bin/php 3.5M /usr/bin/php Each does run slower at first (maybe eight times as long); these are from subsequent runs. Link to comment
deanishe Posted December 19, 2014 Share Posted December 19, 2014 First/subsequent runs make no difference on my machine, so presumably the difference is due to the executable being in the HDD cache (I have an SSD). At any rate, I consistently get 0.03s, 0.04s and 0.13s for Python, Ruby and PHP respectively. That's terrible: most of my workflows finish in about 0.15s. Another reason not to use PHP, tbh. Link to comment
wolph Posted December 19, 2014 Share Posted December 19, 2014 There's actually a potential positive effect to using daemon here, keep alives. Not all languages close the http collection properly so the current approach could open too many connections. In many cases it's probably to complex though. If anyone has a nice use case I'm willing to give it a go Link to comment
rice.shawn Posted December 19, 2014 Share Posted December 19, 2014 First/subsequent runs make no difference on my machine, so presumably the difference is due to the executable being in the HDD cache (I have an SSD). At any rate, I consistently get 0.03s, 0.04s and 0.13s for Python, Ruby and PHP respectively. That's terrible: most of my workflows finish in about 0.15s. Another reason not to use PHP, tbh. I finally put an SSD in my machine, so it's not that (maybe I just have a slower one). The first/subsequent runs are a matter of loading the program into the RAM, so if you've run any of them recently enough, then they'd already be there (so no difference). As far as I know, there shouldn't be any difference in that procedure between a mechanical HD and an SSD. Link to comment
rice.shawn Posted December 19, 2014 Share Posted December 19, 2014 There's actually a potential positive effect to using daemon here, keep alives. Not all languages close the http collection properly so the current approach could open too many connections. In many cases it's probably to complex though. If anyone has a nice use case I'm willing to give it a go I've thought about the keep-alives, but I haven't had time to play with them enough. It would be interesting to see the results of some tests across the main languages (Bash, Ruby, PHP, Python) with regards to a keep-alive daemon vs non-keep-alive daemon vs non-keep-alive-non-daemon. I currently can't think of any good test cases. I'll let you know. Link to comment
Florian Posted December 19, 2014 Share Posted December 19, 2014 I did try to play around with keep-alives at some point but APIs supporting it are just too rare I couldn't get a cool use-case. So I have no numbers to show for it... But it's very easy to implement once you've got the daemon running. Link to comment
deanishe Posted December 19, 2014 Share Posted December 19, 2014 I finally put an SSD in my machine, so it's not that (maybe I just have a slower one). The first/subsequent runs are a matter of loading the program into the RAM, so if you've run any of them recently enough, then they'd already be there (so no difference). As far as I know, there shouldn't be any difference in that procedure between a mechanical HD and an SSD. Hmm. I guess PHP and Ruby must have been in RAM on my machine, too. Link to comment
Andrew Posted December 19, 2014 Share Posted December 19, 2014 Just to let you know, if I get a bit of time before the 2.6 / Remote release, I'm going to look into adding a few options for running script filters... 1. Default: Wait until last one has finished and then run the current query 2. Instantly kill current and run latest 3. Wait a short delay (i.e. finished typing), then kill current and run latest. This should hopefully help reduce a bit of complexity and bodging needed to perform what you are trying to achieve Cheers, Andrew Link to comment
deanishe Posted December 19, 2014 Share Posted December 19, 2014 Good news. 3. Wait a short delay (i.e. finished typing), then kill current and run latest. What would this mean precisely? Alfred waits until, say, 100ms after the user last pressed a key before (killing and) running the Script Filter? Also, how would you kill the process? With SIGTERM? Link to comment
bachya Posted December 19, 2014 Share Posted December 19, 2014 (edited) Just to let you know, if I get a bit of time before the 2.6 / Remote release, I'm going to look into adding a few options for running script filters... 1. Default: Wait until last one has finished and then run the current query 2. Instantly kill current and run latest 3. Wait a short delay (i.e. finished typing), then kill current and run latest. This should hopefully help reduce a bit of complexity and bodging needed to perform what you are trying to achieve Cheers, Andrew Yay, thank you Andrew. You have probably mentioned this elsewhere, but what's your rough timeline for the 2.6 release? Edited December 19, 2014 by Aaron B. wolph 1 Link to comment
rice.shawn Posted December 19, 2014 Share Posted December 19, 2014 Good news. What would this mean precisely? Alfred waits until, say, 100ms after the user last pressed a key before (killing and) running the Script Filter? Also, how would you kill the process? With SIGTERM? Killing with SIGTERM but with a specific code (obviously other than 9) would be best because then we could set some Bash traps to do a clean-up if necessary. Link to comment
deanishe Posted December 19, 2014 Share Posted December 19, 2014 SIGTERM means 15. 9 is SIGKILL. Link to comment
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now