Miguel Tavares Posted January 24, 2021 Share Posted January 24, 2021 (edited) Hi! I've put together a workflow to interact with Obsidian. It relies heavily on some Python libraries like NLTK and Gensim, and I've done my best to package them, following advice from @deanishe found elsewhere on this forum. However, I get an error when I try to run it on a different computer: ``` import regex._regex as _regex "ModuleNotFoundError:" No module named ‘regex._regex’ ``` For what I've gathered, that may have something to do with the fact that Regex (required by NLTK) comes with a precompiled binary that may not play well with the other system or its python interpreter. All my scripts have a shebang pointing to `/usr/bin/python3` and I'm comfortable with it as a minimum requirement, but is there any hope of properly packaging Regex with the workflow? Edited January 24, 2021 by Miguel Tavares Link to comment
deanishe Posted January 24, 2021 Share Posted January 24, 2021 (edited) 1 hour ago, Miguel Tavares said: For what I've gathered, that may have something to do with the fact that Regex (required by NLTK) comes with a precompiled binary that may not play well with the other system or its python interpreter. This. The name of the binary is "_regex.cpython-37m-darwin.so", which means it's only compatible with Python 3.7. Same goes for all of scipy, numpy and gensim. Edited January 24, 2021 by deanishe Link to comment
Miguel Tavares Posted January 24, 2021 Author Share Posted January 24, 2021 Thanks, @deanishe! Any tips on how I could package the workflow neatly and make it work with the default Python3 of macOS? Link to comment
deanishe Posted January 25, 2021 Share Posted January 25, 2021 Short term, you could make sure you install the dependencies in the workflow with Python 3.8, which is what Catalina and Big Sur have, but it likely won't work with the next version of macOS. And I presume that you personally need 3.7 for some reason (3.7 was never part of macOS, AFAIK). Personally, I'd rethink the workflow. It's 300MB, which is nuts. I have over 250 workflows, and yours is larger than all of them combined. What are you doing with gensim and NLTK? Couldn't you use sqlite (included with Python) for fulltext search, instead? Link to comment
Miguel Tavares Posted January 25, 2021 Author Share Posted January 25, 2021 You're right. The workflow is bigger that Obsidian itself. There's a "Related Notes" feature that looks for similar text files. It initially used the simpler Jaccard similarity algorithm, but then I thought "Hey, let's do it properly with TF-IDF". So I imported NTLK for word tokenisation and Gensim for vectorization. It worked fine (on my computer), but everything else went sideways. I don't know a thing about sqlite (or even what role a database would have in this feature), so I'll probably just go back to Jaccard and try to keep things simple. Thanks for your help and advice. Link to comment
deanishe Posted January 25, 2021 Share Posted January 25, 2021 5 hours ago, Miguel Tavares said: I don't know a thing about sqlite (or even what role a database would have in this feature) It has fairly advanced fulltext search capabilities, including Porter stemming, and is extremely fast. As I indicated, I didn't really understand what you're doing with gensim and NLTK. Link to comment
Miguel Tavares Posted January 25, 2021 Author Share Posted January 25, 2021 Just now, deanishe said: As I indicated, I didn't really understand what you're doing with gensim and NLTK. Neither did I, apparently. I had no idea sqlite did those things. I'll be looking into it after all. Cheers! Link to comment
deanishe Posted January 25, 2021 Share Posted January 25, 2021 Just now, Miguel Tavares said: I had no idea sqlite did those things. I have no idea if it's any use for your "related documents" feature. Link to comment
Miguel Tavares Posted January 25, 2021 Author Share Posted January 25, 2021 2 minutes ago, deanishe said: I have no idea if it's any use for your "related documents" feature. It probably is, I think. If it has porter stemming (something I'd never imagine), I assume that stop word removal will be trivial. Then I can just feed a full text search with the "input document" and see what comes out. Right? Link to comment
deanishe Posted January 25, 2021 Share Posted January 25, 2021 3 minutes ago, Miguel Tavares said: I assume that stop word removal will be trivial Pretty sure it doesn't support stop words. You might need to set your own custom tokeniser for that. Here are the full-text search docs: https://www.sqlite.org/fts5.html I usually use a custom ranking function to apply different weightings to different columns (e.g. title vs tags vs body). Link to comment
Miguel Tavares Posted January 25, 2021 Author Share Posted January 25, 2021 50 minutes ago, deanishe said: Here are the full-text search docs: https://www.sqlite.org/fts5.html Thanks! This looks promising. 51 minutes ago, deanishe said: I usually use a custom ranking function to apply different weightings to different columns (e.g. title vs tags vs body). That's precisely one of the things that I'd like to implement. Link to comment
Miguel Tavares Posted January 29, 2021 Author Share Posted January 29, 2021 Hey, @deanishe! I just wanted to let you know that I found this awesome python module and with just a few tweaks I've managed to plug it into my workflow. It's now around 400 kB (!!) Thank you for pointing me in the right direction. Sqlite was indeed the way to go. Cheers! dfay and deanishe 2 Link to comment
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now