Jump to content

Deeply nested, fuzzy matches showing up as top results


donut

Recommended Posts

I'm searching for a folder with the query "RTS", which I expect to result in "~/Dropbox/RTS". It's an exact match I'm looking for the target folder name.

 

However, the results I get are deeply nested, dozens of matches like:

"~/workspace/keyboards/qmk_firmware/lib/chibios/demos/STM32/RT-STM32F103-OLIMEX_STM32_P103"

 

... which is quite counterintuitive to me. In fact, most of my results are like this; deeply nested fuzzy matches ("RT-S" vs "RTS") are dominating the results in a way that I can't find the simplest matches, which I expect to be the exact matches higher in the folder hierarchy or left-most in the path.

 

Is there a way to configure the matching to make this more of exact-match-first?

 

Alfred 4.3 [1205] + PowerPack on macOS Big Sur, 11.1.

 

Link to comment

@awmartin this is a very interesting case as the macOS metadata server seems to be collapsing the - between RT-STM, and therefore there is no differentiation made in accuracy. If you open Terminal.app and paste the following command:

 

mdfind kind:folder "rts"

 

You'll see that your RT-STM32F103-OLIMEX_STM32_P103 found even though it doesn't directly match "rts".

 

If you scroll down to the RTS folder and select it a few times with "rts" typed in Alfred, it should be ranked to the top above the irrelevant results. To understand how Alfred subsequently ranks results after they are returned to him, take a look here:

https://www.alfredapp.com/help/kb/understanding-result-ordering/

 

Your best option to work with these limitations is to create a workflow file filters specifically for folders and only include the scope folders you're interested in searching. There is a built-in example for this under the Workflow preferences > [+] > Examples > Simple Folder Search.

 

Cheers,

Andrew

Link to comment

Thanks for the reply! I familiar with most of this, except the workflows recommendation. In my case, there are so many false matches that I can't find the one I want, so I can't even select it to give it any preference (even with 40 results shown). But that's also only a solution for this particular query; I'd have to do this for nearly everything.

 

Unlike what Alfred help page says, I don't think the default ordering is even taking hold. When I execute `mkfind`,  the second result is the one I want. In Alfred, it simply doesn't show up in the first 40 hits.

 

Also, when I set the sorting to "last modified," the results are still ordered incorrectly. I can rename the folder, then rename it back, and it's not the top hit

 

Overall, I think Alfred should be more opinionated about how results are shown, and that's not by usage by default, in my mind. But I think I'm unique that I don't like fuzzy searches, especially if the fuzzy match is several folders deep. In my own apps, I preference search results by exact match at the start of titles, then exact matches by whole word, then go from there.

 

Is there a way just to exclude all folders called "node_modules"? Or even simpler, just sort the results by path?

Link to comment
5 hours ago, awmartin said:

but I'm having trouble making this part of the default search behavior.

 

Better not to, tbh. Alfred is designed around the idea of multiple, tightly-focussed searches, and you’ll generally get better results if you use it that way. So rather than add folders or PDFs to the default scope, create a File Filter that only shows folders (with the keyword f, for example) and one that only shows PDFs (keyword pdf).

 

5 hours ago, awmartin said:

Is there a way just to exclude all folders called "node_modules"?

 

Unfortunately not. The way the indexer is set up makes this really awkward to do because you can only exclude paths during indexing, not during search.

 

The best solution is to use yarn, so you don’t have your dependencies in the project folder to begin with. Failing that, the only sane way is to write a script that finds and adds node_modules directories to Spotlight’s privacy list.

 

Link to comment
15 hours ago, deanishe said:

Alfred is designed around the idea of multiple, tightly-focussed searches

Well, my example is one where a tighly focused search isn't possible, because of the explosion of results by default. So I had to remove search space by excluding an entire folder of work from the index, because Alfred defers order to the system outside of its control, rather than deploying a pattern of intuition.

 

I've used Alfred for years, so while I'm disappointed that something so simple seems out of scope, I've got it for now. It just means Alfred can't handle the use case of having hundreds of folders and files that are intermingled with the logical results.

 

 

Link to comment
20 hours ago, awmartin said:

Well, my example is one where a tighly focused search isn't possible, because of the explosion of results by default.

 

I think you misunderstand what I mean. You avoid the "explosion of results" by limiting the scope of the search to a manageable number of folders.

Link to comment
9 hours ago, deanishe said:

limiting the scope of the search to a manageable number of folders

 

...or with a more intuitive sorting of results.

 

Folders like node_modules and bower_components and the like make limiting the number of folders inconvenient for developers. Alfred simply doesn’t support this use case efficiently. And it’s totally fine to limit the audience for a product. Value and implementation cost have to be balanced.

Link to comment
16 hours ago, awmartin said:

Alfred simply doesn’t support this use case efficiently.

 

Not Alfred. macOS's metadata search API. Fundamentally, the API doesn't provide a way to rank results by match quality: it's purely match/no match. It also doesn't provide a way to exclude certain results from a search, only from the index itself.

 

It's an important distinction because Alfred's hands are rather tied here: it can't make macOS's search behave in a fundamentally different way. It's not feasible for Alfred to retrieve and post-process the full set of search results.

 

node_modules is basically a worst-case-scenario for an API that works the way macOS’s does, and there isn't any good way to make the results not be 99+% unwanted.

 

For that particular case, a custom Script Filter based on a git-aware tool like ripgrep will work much better.

Edited by deanishe
Link to comment
46 minutes ago, deanishe said:

Alfred's hands are rather tied here

 

I think this is a false assumption, and this part:

 

46 minutes ago, deanishe said:

Not Alfred. macOS's metadata search API

 

... is rather exactly the point. Alfred is relying on this metadata search API too much, and passing the buck, and the user experience suffers. I don't think users should have to know the underlying implementation and work around what's effectively an excuse for bad results.

 

Quote

It's not feasible for Alfred to retrieve and post-process the full set of search results.

 

Really? Does the metadata search always return too many results to handle?

 

Why put Alfred's UX at the mercy of the metadata search alone, something that clearly has shifting behaviors? In my example, it treats "RTS" and "RT-S" identically. Why would we want that as users? Products should be designed with the end users' mindsets and goals in mind, not the metadata search's quirks. Of course, I'm just one lone user, but if performance is the issue, I would make a different performance vs feature tradeoff. While it's always a careful decision to defer choices to users, does this occur enough that it's worth the feature? "Hey, we'll give you more control, but Alfred will be a bit slower."

 

I think there are potentially low-cost ways for Alfred to take these results from the metadata search and present them in a more intuitive order by default. In the case of folders, these might work for my use case:

  • sort them by path string
  • exclude any path with "node_modules" anywhere in the path, treated as a string
  • sort by the length of the path string, so shorter ones show up first
  • sort by the number of slashes (or by depth), making "~/Documents" first, then "~/Blah/Documents" show up second

These are relatively low cost ways of getting to a more intuitive set of results and would work for me.

 

Edited by awmartin
Referring to another quote in previous response
Link to comment
14 hours ago, awmartin said:

I think this is a false assumption

 

It isn't an assumption. It's a statement based on my own experiences working with the APIs.

 

14 hours ago, awmartin said:

These are relatively low cost ways of getting to a more intuitive set of results and would work for me.

 

They are not "low cost". The API doesn't return metadata, it returns proxy objects. Retrieving the metadata via the proxies is a separate step, and fetching 10,000 items takes 100x longer than fetching 100. It's fast, but not so fast that loading thousands of additional results isn't noticeably slower.

 

All the features you suggest would be great. They've all been suggested several times before (including by me), and I know they're on Andrew's list. But you're over-trivialising the performance costs.

 

21 hours ago, awmartin said:

Products should be designed with the end users' mindsets and goals in mind, not the metadata search's quirks.

 

Performance-critical characteristics aren't "quirks". What you're talking about could add seconds to the search time in the worst cases. That isn't an obvious improvement.

 

Some sort of post-processing may be added when Andrew thinks Macs are fast enough. Problem is, everybody wants to use it to filter out node_modules, which is pretty much a worst-case scenario, so the performance would have to be really good.

Link to comment

I don't feel we're connecting. When I said "quirks," I wasn't talking about performance characteristics, I was talking about the metadata search treating "RTS" and "RT-S" as the same, which I gleaned was a surprise, given the first response to my inquiry.

 

I don't know the implementation costs, obviously, and I don't mean to overtrivialize, so please don't characterize me as purposefully doing so. I'm just making a user-oriented argument for why wrangling with the complexity is worth it, but as I noted before, I'll deal.

 

Edited by awmartin
"treats" to "treating"
Link to comment
8 hours ago, awmartin said:

I'm just making a user-oriented argument for why wrangling with the complexity is worth it

 

I understand. And I'm trying to put some numbers on the other side of the cost–benefit equation because that's where the problem is.

 

Your case regarding the benefits of additional filtering/sorting has been made many times before (just search the forum for "node_modules"). Everyone basically already agrees it would be awesome.

 

To actually get the feature, we have to persuade Andrew that it's no longer unacceptably slow.

Link to comment

I think there's a way to achieve some of these goals without changing the search, just the rendering in the ViewController.

 

For me, the bottom line is:

  1. Alfred is returning incorrect results. This is a bug. Results for "RTS" are being polluted with "RT-S". I don't think "Well, it's the metadata server" is a good reason for why (from a user's perspective, while it might be a good root cause for the developer).
  2. Those polluted results are preventing me from finding the most obvious, most intuitive results, which to me in this use case are the highest in the folder hierarchy. There are other rubricks by which a user might consider results as intuitive.

I have a workaround for 2, but I don't for 1.

Edited by awmartin
clarification on to whom the problem matters
Link to comment
33 minutes ago, awmartin said:

just the rendering in the ViewController

 

How do you think that would work? How will you get correct results for "sort by shortest path" if you're still only retrieving a partial result set from the API and it's sorted by date or filename?

Link to comment

You are really trivialising this conundrum with "just the rendering in the ViewController"

 

I'll clarify... the file search API provided by macOS doesn't differentiate between RT-S and RTS, so in your specific case, what's to say that the metadata server won't return me the "RTS" exact match until the 100,000th result... That would mean that for every single typed character, Alfred would have to load 100k results JUST IN CASE there was an exact match somewhere down there. The cost benefit of this is absolutely not worth it.

 

The hyphen causing this issue is an unfortunate issue within the search API, and interestingly, something which has changed over various iterations of macOS. A few major versions back, Apple didn't treat a hyphen as a word break, which caused much larger issues than this. If the file was "RT S" vs "RTS", without the hyphen, then it works correctly.

 

The reason this isn't a bug in Alfred is because sorting results is much more complex than exact match at the top. For the most useful results, the top matches should evolve over time, and be learnt. Alfred WILL learn what you want up the top, which is why I gave you this link:

 

https://www.alfredapp.com/help/kb/understanding-result-ordering/

 

Cheers,

Andrew

Link to comment
Posted (edited)
21 hours ago, Andrew said:

Alfred WILL learn what you want up the top

 

No, it won't. I can't select the result (Yes, I read the link; selection is required according to 1 and 2.), because I can't find the results, because of the number of matches, because Alfred is not matching properly.

 

21 hours ago, Andrew said:

The reason this isn't a bug in Alfred is because sorting results is much more complex than exact match at the top.

 

It's a bug from a user's perspective because the matching isn't right. The returned set of results is incorrect.

 

But I can see you've made up your minds. I thought it might be helpful to point out the results are wrong and try to work through solutions, but I don't know the implementation contraints. I'm just a random user. I'm good. Sorry to bother.

Edited by awmartin
plurals
Link to comment
6 minutes ago, awmartin said:

No, it won't. I can't select the result

 

Part 3 of this link: https://www.alfredapp.com/help/kb/understanding-result-ordering/ 

 

Quote

3. Default File Ordering

When Alfred has no internal knowledge of your file usage, the macOS metadata index has a couple of very useful fields which help the default result sorting. These are "Last Used", and "Last Modified", and are automatically maintained during normal usage of your Mac, both inside and outside of Alfred.

 

These metadata flags are the API's way of ranking files. Open Terminal, and type:

 

open ~/Dropbox/RTS

 

If your metadata is functioning correctly, then the kMDItemUsedDates metadata flag will be updated, and the macOS API should then match this RTS above all others, and therefore it should be returned into Alfred's result set.

 

If you're still not seeing the folder in Alfred's results, your underlying index may in fact be malfunctioning. Take a look at this link for diagnostics on fixing this:

 

https://www.alfredapp.com/help/troubleshooting/indexing/

Link to comment
1 minute ago, deanishe said:

I’ve spent the last week explaining them to you in this thread…

 

Sigh... Which you didn't really need to do in depth. I'm just a sole user and "can't fix" would suffice. Sorry to bother. Thanks for doing this.

 

6 minutes ago, Andrew said:

These metadata flags are the API's way of ranking files. Open Terminal, and type:

 


open ~/Dropbox/RTS

 

Thanks. I'll give this a try.

 

I can tell I'm frustrating you both, and you're kind enough to still engage. While I now understand Alfred's model for results ordering makes filtering and additional sorting difficult and particularly performance-constrained, I still don't understand why while searching for "RTS" being polluted with results containing "RT-S" isn't considered a bug from a user and product perspective. Wouldn't you all not want "RT-S" results to show up? Isn't this undesirable? If the answer is, "Yes, that's valid, but can't fix." then that's an ok response for a frustrated customer. If not valid, I guess I'll deal with the coming frustration of bad results or worst case, find another tool, which I'd rather not do, because Alfred is great.

 

Link to comment
30 minutes ago, awmartin said:

I still don't understand why while searching for "RTS" being polluted with results containing "RT-S" isn't considered a bug from a user and product perspective

 

I completely agree that in this case, if you're wanting RTS to match at the top of Alfred's results, then it absolutely should be at the top in Alfred.

 

Alfred provides as much as possible the mechanism to achieve this, but unfortunately, if macOS is not returning this folder as a result to Alfred when he does a query to the metadata API, Alfred has no idea that the folder "RTS" even exists, let alone how to sort it in your result list.

 

If you can somehow get RTS into Alfred's results by prioritising it in the macOS usage data (try that open trick I suggested), then you should be able to get it up to the top in Alfred.

 

 

Link to comment
5 hours ago, awmartin said:

I still don't understand why while searching for "RTS" being polluted with results containing "RT-S" isn't considered a bug from a user and product perspective.

  

5 hours ago, awmartin said:

If the answer is, "Yes, that's valid, but can't fix."

 

This. Nobody is arguing that it's not a bug. We are explaining why fixing the problem on Alfred's side is not feasible.

Edited by deanishe
Link to comment
22 hours ago, Andrew said:

Alfred provides as much as possible the mechanism to achieve this, but unfortunately, if macOS is not returning this folder as a result to Alfred when he does a query to the metadata API, Alfred has no idea that the folder "RTS" even exists, let alone how to sort it in your result list.

 

If you can somehow get RTS into Alfred's results by prioritising it in the macOS usage data (try that open trick I suggested), then you should be able to get it up to the top in Alfred.

 

I see. So inferring, Alfred already trims the results list to 20 or 40 at search time.

 

I'm wondering about similar situations in which any search would be polluted similarly. "ABC" having results like "A-B-C", for example. It would be tedious to prime each query if it happens a lot, but if it doesn't, this is manageable. I'll do my best.

 

@deanishe I didn't get the impression you acknowledged it was a bug, either. The impression I got was this is normal, expected behavior. But now we've clarified. Thanks for your explanations.

Link to comment
39 minutes ago, awmartin said:

Alfred already trims the results list to 20 or 40 at search time.


No, it doesn’t trim the results. As I’ve been trying to tell you, Alfred does not fetch more results than it uses. That’s why it’s so fast and why what you’ve been talking about (fetching all results) would be extremely slow.

 

42 minutes ago, awmartin said:

I didn't get the impression you acknowledged it was a bug


I told you repeatedly that everything you suggested was desirable but ultimately not doable.

Link to comment
Posted (edited)

I'm not a fan of these kinds of exchanges. We have a communication disconnect. Thanks for your help. And again, not intending to be difficult, just trying to understand and be helpful. Apologies for the frustrations I've caused.

Edited by awmartin
Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...