SQLite query object for workflows

dfay · January 12, 2018

Could we get native support for querying a SQLite database? It seems this is already being done in a lot of workflows (e.g. with Zotero, Mail, Ulysses) and/or is needed for updates (Notes). I am faced with the prospect of reverse-engineering one of these to try to write a workflow that would query a MaxQDA coding scheme. But the use of SQLite seems so widespread it might be helpful to include it natively in Alfred. Or if not in Alfred itself, in one of the workflow libraries (my vote is for python since @deanishe has probably done 90% of the work required already....).

nikivi · January 12, 2018

Here is one workflow that queries SQLite in Go. I am also experimenting with a workflow that uses SQLite. Native or library support for it would be really cool indeed.

deanishe · January 12, 2018

7 minutes ago, dfay said:

Could we get native support for querying a SQLite database?

I don't think that could offer a whole lot more than directly using the sqlite3 command-line tool (which outputs CSV). Support for JSON, perhaps?

You really want native support in the language you're using, so you can use native strings, numbers, date objects etc.

20 minutes ago, dfay said:

in one of the workflow libraries

I'm not sure there's a whole lot of broadly-useful middle ground between Python's native sqlite3 library and a full-blown ORM like peewee or SQLAlchemy.

peewee is tiny compared to SQLAlchemy (presumably where the name comes from), but it's still a big library and importing it noticeably slows down Script Filters.

SQL typically starts to get hairy when you're dealing with relationships between tables, especially many-to-many ones. If a library can't model relationships (and generate corresponding SQL for you), you're likely to be spending most of your time writing the SQL queries, anyway.

There are also SQL query builders for generating SQL queries, but not models. Personally, I find these are more work than just using SQL, at least when trying to read somebody else's database, as you end up first writing the query in SQL while you're testing it out in a GUI client, and then you have to translate it to the builder API.

deanishe · January 12, 2018

40 minutes ago, nikivi said:

Here is one workflow that queries SQLite in Go

It's not a good idea to use SQLite for something like that (in Go). It adds a large, non-native dependency (sqlite, which is C), and it isn't going to be meaningfully faster than storing the repos as JSON unless you have several thousand repos.

nikivi · January 12, 2018

17 minutes ago, deanishe said:

unless you have several thousand repos

I actually do have several thousand repos. ?

And this workflow is insanely fast. All it lacks is fuzzy searching. But I will add a PR for that soon. I hope it is possible.

Edited January 12, 2018 by nikivi

deanishe · January 12, 2018

No, you have 30 repos.

nikivi · January 12, 2018

25 minutes ago, deanishe said:

No, you have 30 repos.

I meant it in the context of the GitHub Jump workflow which searches repos users starred. In my case its a bit over 4,000 thus SQLite DB is really noticeable.

I have it run the `update` script every evening so it always keeps the index up-to-date and it's pretty amazing. Here is the macro that updates the index:

5a58bbcf4d548_2018-01-12at14_44.png.e865ef4c83fc280947d3529fcc062e3a.png

This one workflow means I have my entire GitHub world with less than 1 second access to open issues, view repo, make releases to my own repos, clone to a directory or view PR's. The author of the workflow didn't merge my PR that adds my bindings so I added the modified workflow to Small Workflows giving credit to original author.

I plan to add fuzzy search to it though so I can do queries like `face reactjs` to show me Reactjs repo because currently I have to type the entire name in full if there are many conflicting entries:

5a58bc90e1a79_2018-01-12at14_47.png.c2ba2667f6c29a017498fff28ba70cfd.png

Fuzzy searching would solve this. I was thinking of wrapping the workflow around AwGo to take use of what AwGo does with fuzzy searching but not sure how to combine it with the original workflow. Will try look at it again. Maybe I can extract the AwGo portion that does the fuzzy searching and apply it on the results somehow.

I also plan to do a similar workflow for GitHub users a person follows. To quickly jump to their profiles, search their repos, search their stars even (if API allows) and more.

Edited January 12, 2018 by nikivi

deanishe · January 12, 2018

Ah, right. My bad. I hadn't realised it also indexes your starred repos (the README just says "your repos").

In that case, then yeah, there is a decent reason to use SQLite.

But you'll lose all the performance of SQLite if you try to apply AwGo's fuzzy search to it.

nikivi · January 12, 2018

4 minutes ago, deanishe said:

But you'll lose all the performance of SQLite if you try to apply AwGo's fuzzy search to it

Yeah I thought that'd be the case. Oh well. The workflow is good as it is.

deanishe · January 12, 2018

4000 shouldn't be any kind of a problem for AwGo. The "trick" would be to use something faster than JSON to store the data. Even the built-in gob is 3x faster.

Edited January 12, 2018 by deanishe

deanishe · January 12, 2018

You can also simulate a fuzzy search with SQLite by interspersing your query with %. I'm pretty sure this is how Alfred's fuzzy search for applications and preference panes works.

Alfred keeps its cache of apps and pref panes in ~/Library/Application Support/Alfred 3/Databases/filecache.alfdb, which is a normal SQLite3 database. When I enter kotor as my query, Alfred finds Knights of the Old Republic.app and I'm pretty sure the query it's actually running to find this is:

SELECT * FROM `files` WHERE `files`.`nameSplit` LIKE '%k%o%t%o%r%'

The problem with doing this vs AwGo's fuzzy search is that the results aren't ranked in any way. But if Alfred's knowledge is turned on, it will do the ranking for you.

Edited January 12, 2018 by deanishe

deanishe · January 13, 2018

10 hours ago, deanishe said:

But you'll lose all the performance of SQLite if you try to apply AwGo's fuzzy search to it.

Forget this comment. It's completely wrong. This one was right:

15 hours ago, deanishe said:

It's not a good idea to use SQLite for something like that

I thought the workflow looked interesting, so I had a closer look at the source code.

It doesn't actually use SQLite for searching, just as a "dumb" datastore. It still loads every repo into memory on each run and filters it with go-alfred's MatchesTerms function.

This kills the main performance advantage of using SQLite (databases get much of their speed from not loading unwanted stuff into memory), so the only speed tradeoff would be AwGo's fuzzy search vs go-alfred's simpler "contains" search (which is still significant, but shouldn't be super-noticeable for ~4000 items).

I'm 99% certain that using an SQL query similar to what I posted above would be the fastest of all, but you'd have to change the database schema to include the full name of each repo, as that is what you want to search against, and it currently isn't stored in the database.

nikivi · January 13, 2018

I will try to use AwGo and its fuzzy searching Thank you a lot for your insight @deanishe

Edited January 13, 2018 by nikivi

h2ner · March 16, 2019

On 1/12/2018 at 12:22 AM, deanishe said:

peewee is tiny compared to SQLAlchemy (presumably where the name comes from), but it's still a big library and importing it noticeably slows down Script Filters.

The main workflow I use and perhaps the only one I spent any significant time on uses peewee. Agree native sqlite support might not be worth adding. It's configured to work on install using sqlite but optionally with postgresql & sphinx for full-text search. Currently searching 1 million+ records with sphinx and running at the terminal 'time' a script filter using your alfred-workflow returns results in ~0.15 seconds. I'm a novice at python, haven't profiled it in a while, haven't done any dev on it in a while, just got back to it, and am unsure if it's function calls, peewee initialization, db driver overhead, etc., I'm unsure what constitutes that time. I'm now looking at porting it to Go for more speed. It's not too bad but certainly faster is better. In a quick comparison of various ORMs, I found peewee easier to use and not to bad to write code to use various db backends.

Edited March 16, 2019 by h2ner

deanishe · March 17, 2019

It’s not the overhead of running Peewee that makes it slow, it’s importing it. Because Alfred typically tries to run Script Filters several time a second, the overhead of importing such a large library is noticeable.

If you use the SQLite module directly instead of an ORM, the workflow will be much more responsive.

h2ner · March 17, 2019

Indeed the SQLite module would be much more responsive. At the time a few years ago, I had tried to optimize it as much as I could using peewee and considered it a bit slow. Gotten used to it since then but now revisiting it, maybe an eventual port to Go would be worth it. I'd possibly still use some ORM or ORM-ish library thought it should be better.

deanishe · March 17, 2019

You won’t gain much by rewriting in Go if all the workflow does is search SQLite and return results.

Also, SQL is more complicated in Go due to its static typing.

Removing peewee is where the gains are at.

h2ner · March 17, 2019

I realize removing peewee would give me great gains. If it's something like .15–.2+ seconds to return results for a script filter, there is a visible delay. I'm pretty novice at programming and my workflow was meant to be able to work out of the box with sqlite, but usable with other databases such as postgresql. Full-text searches with relevancy ranking over two full text indices in two tables; after enough records postgresql FTS became slow so I added sphinx. Maybe I could figure it out without an ORM though there'd be more code, more to maintain, and so forth. I'm not sure if I'm ready to do that yet and maybe some lightweight ORM-like library in Go, though indeed more complicated as you say, may work better. There may be other reasons too to want to switch. My workflow works fairly well enough as is for me, though performance certainly could be better, though it's a good amount left to do before release whenever that happens.

SQLite query object for workflows

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in