Problem with packaged Python libraries

Miguel Tavares · January 24, 2021

Hi! I've put together a workflow to interact with Obsidian. It relies heavily on some Python libraries like NLTK and Gensim, and I've done my best to package them, following advice from @deanishe found elsewhere on this forum.

However, I get an error when I try to run it on a different computer:

```
import regex._regex as _regex "ModuleNotFoundError:" No module named ‘regex._regex’

```

For what I've gathered, that may have something to do with the fact that Regex (required by NLTK) comes with a precompiled binary that may not play well with the other system or its python interpreter.

All my scripts have a shebang pointing to `/usr/bin/python3` and I'm comfortable with it as a minimum requirement, but is there any hope of properly packaging Regex with the workflow?

Edited January 24, 2021 by Miguel Tavares

deanishe · January 24, 2021

1 hour ago, Miguel Tavares said:

For what I've gathered, that may have something to do with the fact that Regex (required by NLTK) comes with a precompiled binary that may not play well with the other system or its python interpreter.

This. The name of the binary is "_regex.cpython-37m-darwin.so", which means it's only compatible with Python 3.7. Same goes for all of scipy, numpy and gensim.

Edited January 24, 2021 by deanishe

Miguel Tavares · January 24, 2021

Thanks, @deanishe!

Any tips on how I could package the workflow neatly and make it work with the default Python3 of macOS?

deanishe · January 25, 2021

Short term, you could make sure you install the dependencies in the workflow with Python 3.8, which is what Catalina and Big Sur have, but it likely won't work with the next version of macOS. And I presume that you personally need 3.7 for some reason (3.7 was never part of macOS, AFAIK).

Personally, I'd rethink the workflow. It's 300MB, which is nuts. I have over 250 workflows, and yours is larger than all of them combined.

What are you doing with gensim and NLTK? Couldn't you use sqlite (included with Python) for fulltext search, instead?

Miguel Tavares · January 25, 2021

You're right. The workflow is bigger that Obsidian itself.

There's a "Related Notes" feature that looks for similar text files. It initially used the simpler Jaccard similarity algorithm, but then I thought "Hey, let's do it properly with TF-IDF". So I imported NTLK for word tokenisation and Gensim for vectorization. It worked fine (on my computer), but everything else went sideways.

I don't know a thing about sqlite (or even what role a database would have in this feature), so I'll probably just go back to Jaccard and try to keep things simple.

Thanks for your help and advice.

deanishe · January 25, 2021

5 hours ago, Miguel Tavares said:

I don't know a thing about sqlite (or even what role a database would have in this feature)

It has fairly advanced fulltext search capabilities, including Porter stemming, and is extremely fast. As I indicated, I didn't really understand what you're doing with gensim and NLTK.

Miguel Tavares · January 25, 2021

Just now, deanishe said:

As I indicated, I didn't really understand what you're doing with gensim and NLTK.

Neither did I, apparently.

I had no idea sqlite did those things. I'll be looking into it after all.

Cheers!

deanishe · January 25, 2021

Just now, Miguel Tavares said:

I had no idea sqlite did those things.

I have no idea if it's any use for your "related documents" feature.

Miguel Tavares · January 25, 2021

2 minutes ago, deanishe said:

I have no idea if it's any use for your "related documents" feature.

It probably is, I think. If it has porter stemming (something I'd never imagine), I assume that stop word removal will be trivial. Then I can just feed a full text search with the "input document" and see what comes out. Right?

deanishe · January 25, 2021

3 minutes ago, Miguel Tavares said:

I assume that stop word removal will be trivial

Pretty sure it doesn't support stop words. You might need to set your own custom tokeniser for that. Here are the full-text search docs: https://www.sqlite.org/fts5.html

I usually use a custom ranking function to apply different weightings to different columns (e.g. title vs tags vs body).

Miguel Tavares · January 25, 2021

50 minutes ago, deanishe said:

Here are the full-text search docs: https://www.sqlite.org/fts5.html

Thanks! This looks promising.

51 minutes ago, deanishe said:

I usually use a custom ranking function to apply different weightings to different columns (e.g. title vs tags vs body).

That's precisely one of the things that I'd like to implement.

Miguel Tavares · January 29, 2021

Hey, @deanishe!

I just wanted to let you know that I found this awesome python module and with just a few tweaks I've managed to plug it into my workflow. It's now around 400 kB (!!)

Thank you for pointing me in the right direction. Sqlite was indeed the way to go.

Cheers!

Sign In

Problem with packaged Python libraries

Recommended Posts

Miguel Tavares

Link to comment

deanishe

Link to comment

Miguel Tavares

Link to comment

deanishe

Link to comment

Miguel Tavares

Link to comment

deanishe

Link to comment

Miguel Tavares

Link to comment

deanishe

Link to comment

Miguel Tavares

Link to comment

deanishe

Link to comment

Miguel Tavares

Link to comment

Miguel Tavares

Link to comment

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity