Jump to content

AlfredOCR - Optical Character Recognition


Recommended Posts

Posted (edited)

Alfred OCR

I noticed that Apple's Vision framework finally produces some usable results.

This means: No external dependencies are required to perform the OCR.

 

Download On Github

 


  ocr.png.065f1bf3ffaeba1388e3dcdf944420f0.png Alfred OCR Light

 

The workflow allows you to copy text from images using optical character recognition.

Take a snapshot with your mouse or trackpad to automatically copy the recognized text to the clipboard. 

preview_ocr_snapshot.gif

 


 ocr+.png  Alfred OCR+

 

The workflow allows you to copy text from images, or to convert PDF files into searchable PDF documents
using optical character recognition, and to apply compression to PDF documents.

 

1 / Snapshot
Take a snapshot with your mouse or trackpad to automatically copy the recognized text to the clipboard. 

  • Default shortcut: ⌘+⇧+6 
  • Default keyword: ocr

 

2 / PDF Document

  • To convert a PDF into a searchable PDF document, pass it to the workflow’s Universal Action
    • To compress the resulting PDF, pass the source document on while pressing the ⌘+⇧ keys.
    • To open the resulting PDF, pass the source document on while pressing the ⌥+⇧ keys.
    • To force the replacement of a source document, pass it on while pressing the ⌥+⌘ keys.
  • To compress a PDF without performing OCR, pass it to the Compress PDF Document File Action.
  • To view the progress tracker, re-enable the workflow with the Keyword (default: ocr).

preview_ocr1.png

 preview_ocr2.png

 

Configuration

To open the OCR Workflow Configuration, type the keyword preceeded by a colon (default: ocr).

preview_ocr3.png

 

Languages

Specify the languages you want the OCR process to consider by adding the appropriate RFC-5646 language tag. The following languages (and regions) are currently supported: en-US, fr-FR, it-IT, de-DE, es-ES, pt-BR, zh-Hans, zh-Hant, yue-Hans, yue-Hant, ko-KR, ja-JA, ru-RU, uk-UA


Explanations:

  • en-US: (English as used in the United States)
  • de-DE: (German as used in Germany)
  • fr-FR: (French as used in France)
  • it-IT: (Italian as used in Italy)
  • es-ES: (Spanish as used in Spain)
  • pt-BR: (Portuguese as used in Brazil)
  • ko-KR: (Korean as used in South Korea)
  • uk-UA: (Ukrainian as used in Ukraine)
  • ja-JA: (Japanese as used in Japan)
  • ru-RU: (Russian as used in Russia)
  • yue-Hant: (Traditional Cantonese)
  • yue-Hans: (Simplified Cantonese)
  • zh-Hant: (Traditional Chinese)
  • zh-Hans: (Simplified Chinese)

 


Change Log

 

v1.4.0 (OCR+)

  • Added bitmap compression and compression facets
  • Added embedding strategy options
    • "Word Granularity" attempts to embed the text word by word
    • "Line Oriented" is the strategy previously used (use if you encounter unexpected results)
  • Improved OCR embedding granularity
  • Fixed 'Recognizer init' error

v1.3.0 (OCR+)

  • Added PDF compression
  • Added a keyword for quick access to the workflow configuration (Alfred 5.1+)
  • Added Universal Action modifier option to apply compression to PDFs (⇧⌘)
  • Added Universal Action modifier option to open converted PDFs in the default application (⌥⇧)
  • Added a configuration option to open converted PDFs in the default application
  • Added a configuration option to specify how text should be joined when taking a snapshot
  • Added a File Action to compress PDF documents
  • Changed the modifier keys to replace a PDF and added noticeable visual cues (⌥⌘)
  • Changed the way an export strategy is specified by using a pop-up selection box
  • Improved performance

v1.2.3 (OCR+)

  • Fixed an error thrown due to missing workflow cache directory
  • Fixed snapshot tasks queuing up if they are started before the previous task has finished
  • Added explicit opt-out of Snapshot tasks while PDF conversions are running

v1.2.2 (OCR+)

  • Fixed low contrast output images produced for some PDF documents
  • Added progress tracker for the document recognition process
  • Added three options to handle document output: export to location, copy to same location and replace. Priority behavior: Replace > Copy > Export.
  • Added new icons
  • Improved output file size for PDF documents that do not already contain text

 

v1.2.0 (OCR Light)

  • Add File Action to extract text from images
  • Fix for macOS Sonoma (Compiles the script en passant to compensate for the failure to link objc symbols on macOS 14).

v1.1.1 (OCR Light)

  • Adds if #available check to accommodate macOS 12.0

v1.1.0 (OCR Light)

  • Updated configuration and documentation
  • Added new icon
Edited by zeitlings
v1.2.0 OCR Light
Posted

Nice! This would be a quick and useful one to add to the Gallery. Just two notes:


The icon is very low resolution, it even looks pixelated in the editor. When exporting from SF Symbols you can pick the size, the recommended for workflows is 256x256px.


Your repo is quite organised, but has workflows which can go in the Gallery (like this one) and others which cannot (like the dictionary workflow, due to the unsigned binary). That is OK, but because they are shared as releases (as opposed to files in the repo) it becomes harder to check for updates because GitHub only provides a unified releases feed. The more you release, the more difficult it’ll become to separate them. To be clear, posting to releases is the preferred method, just not when the repo has many unrelated workflows.


Would you consider having them on their own repo, or having some files checked-in which are modified when the corresponding workflow is updated, for example? Basically the idea is to provide something which can be checked for changes.


Also, may I recommend adding a Hotkey Trigger? I can see myself adding ⌘⇧6 as a natural shortcut to this one.

Posted (edited)

Sure, ⌘⇧6 feels like a natural extension. There's already an updated version that also bundles a higher resolution icon. As for the dictionary workflow, it works by calling some cryptic API endpoints that are only accessible via Objective-C. Unfortunately, there is no way to do this in plain swift that I know of. 

 

I'll send you a message about the rest.

Edited by zeitlings
Posted
On 3/2/2023 at 1:27 PM, zeitlings said:

I noticed that Apple's Vision framework finally produces some usable results. This means: OCR without external dependencies!

 

image.png.4d5ca4acb8b0c855fddbab50e6edec7a.png  AlfredOCR

 

Description: The workflow allows you to copy text from images using optical character recognition. Take a snapshot with your mouse or trackpad and the recognized text is copied to the clipboard. No external dependencies are required to perform the OCR.

 

‣ Download on Github

 

 

Nice workflow. Would it be possible to make it work in PDF files via a File Action?

Posted
4 hours ago, xilopaint said:

Nice workflow. Would it be possible to make it work in PDF files via a File Action?

 

I guess so. My first experiments, from which the workflow is derived, were actually with PDF documents. I'll play around with that sometime.

Posted

Ok, here's a follow-up. I was thinking about converting PDFs to searchable PDFs by embedding a hidden text layer. Turns out PDFKit doesn't provide any access to the underlying PDF content streams at all, and no alternative way to embed text layers. At best, the information can be inserted as annotations, which are not embedded statically, but as objects that you can change at will.

 

This is rather annoying, because the Preview app shows that PDFKit is very much capable of embedding text layers. Example: When you open a PDF with no text or an image in the Preview app, the "Live Text" feature lets you select and copy recognized text as if OCR had been fully performed. When exporting the PDF you can even enable "Embed Text", which does exactly what we're trying to accomplish here. (And they do sell it as a feature of PDFKit).

 

Anyway, as it stands now, it's a convoluted process I haven't made sense of yet.

 

Pulling the plain text out of PDFs without an OCR layer isn't a problem, though. But I'm not convinced how useful that is  ¯\_(ツ)_/¯

Posted

This would be a great addition if it was possible and thank you for the super useful workflow in the meantime.

 

I've find this command line tool really useful for OCR on existing PDFs and have alfred set up to trigger with a workflow but obviously would be far more convenient if PDFKit was able to do the work instead.

 

Posted

😱 Try this!

 

I managed to get some acceptable results. The internal font handling and bounding box scaling works with some heuristics for now, though.

Also, since there's no progress tracking and the code is completely synchronous, it's best to test the workflow on small documents. Still, the debugger will log some landmarks that you can review after the fact.

 

@sepulchra You're welcome 🤗 "OCRmyPDF" will most likely give you better results, and should probably remain your go-to if it is already set up. But at least here's a few steps towards a native solution 😁.

Posted (edited)

Hey this is great. Would it be possible to have a modifier used and give the option of overwriting the existing file instead of exporting to another location? 

Edited by sepulchra
Posted (edited)
On 3/7/2023 at 2:34 PM, zeitlings said:

😱 Try this!

 

Wow! This is impressive and promising!

 

Would you consider to add a suffix to the name of the OCR’d document? I think it could be an option in the User Configuration. Also, I think /tmp is not a good export location. I’d suggest to use ~/Desktop. That’s what ~/Desktop is for, so the user can later decide where to put the file.

Edited by xilopaint
Posted

@sepulchra yes! 😁

@xilopaint yes! 😁

 

I have given the project a little more attention, i.e. fixed some bugs, implemented some optimizations and added a proper progress tracking system.

There is now a light version to keep the binary-free workflow alive, and a plus version that adds the PDF processing to it.

 

v1.2.2 (OCR+)

  • Fixed low contrast output images produced for some PDF documents
  • Added progress tracker for the document recognition process
  • Added three options to handle document output: export to location, copy to same location and replace. Priority behavior: Replace > Copy > Export.
  • Added new icons
  • Improved output file size for PDF documents that do not already contain text
Posted (edited)
On 4/6/2023 at 8:13 PM, zeitlings said:

@xilopaint yes! 😁

 

The new version doesn't work for me. It fails when checking for a path that does not exist:
 

[02:20:57.247] Optical Character Recognition[Universal Action] Processing complete
[02:20:57.254] Optical Character Recognition[Universal Action] Passing output '/Users/xxxx/Desktop/test.pdf' to Arg and Vars
[02:20:57.257] Optical Character Recognition[Arg and Vars] Processing complete
[02:20:57.258] Optical Character Recognition[Arg and Vars] Passing output '' to Run Script
[02:20:57.369] STDERR: Optical Character Recognition[Run Script] 2023-04-07 02:20:57.365 ocr[2995:28153] The folder “progress.txt” doesn’t exist.
[02:20:57.376] Optical Character Recognition[Run Script] Processing complete
[02:20:57.378] Optical Character Recognition[Run Script] Passing output '- Error Domain=NSCocoaErrorDomain Code=4 "The folder “progress.txt” doesn’t exist." UserInfo={NSFilePath=/Users/xxxx/Library/Caches/com.runningwithcrayons.Alfred/Workflow Data/com.zeitlings.ocr/progress.txt, NSUserStringVariant=Folder, NSUnderlyingError=0x600000e7a760 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}} #0
  - super: NSObject
' to Conditional
[02:20:57.379] Optical Character Recognition[Conditional] Processing complete
[02:20:57.380] Optical Character Recognition[Conditional] Passing output '- Error Domain=NSCocoaErrorDomain Code=4 "The folder “progress.txt” doesn’t exist." UserInfo={NSFilePath=/Users/xxxx/Library/Caches/com.runningwithcrayons.Alfred/Workflow Data/com.zeitlings.ocr/progress.txt, NSUserStringVariant=Folder, NSUnderlyingError=0x600000e7a760 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}} #0
  - super: NSObject
' to Post Notification

 

I’ve also tried to create progress.txt myself. The workflow ran with no errors but the OCR’d PDF file has not been created.

Edited by xilopaint
Posted

Ah, thanks for reporting that! Looks like the cache folder (where the temporary progress file lives) has to be manually created first.

 

If you'd like to run a quick test, you can replace the following line in both the Run Script and Script Filter objects. 

 

if [[ -f "${alfred_workflow_cache}/progress.txt" || -v pdf_path ]]; then

 

With

 

[[ -d "${alfred_workflow_cache}" ]] || mkdir -p "${alfred_workflow_cache}"
if [[ -f "${alfred_workflow_cache}/progress.txt" || -v pdf_path ]]; then

 

(And make sure to delete the manually created progress.txt again. If it exists, the workflow assumes that an OCR job is running.)

Posted

v1.2.3
• Fixed an error thrown due to missing workflow cache directory
• Fixed snapshot tasks queuing up if they are started before the previous task has finished
• Added explicit opt-out of Snapshot tasks while PDF conversions are running

Posted

@zeitlings Thanks for sharing your workflow. The file size generated is huge. For 80 MB pdf file created from video screenshots, it is giving ocr pdf file with 1.15 GB size. Otherwise, I use the PDF exchange editor on windows and get a similar size after ocr. For your kind information and review

Posted (edited)
On 4/7/2023 at 7:36 AM, zeitlings said:

v1.2.3
• Fixed an error thrown due to missing workflow cache directory
• Fixed snapshot tasks queuing up if they are started before the previous task has finished
• Added explicit opt-out of Snapshot tasks while PDF conversions are running

 

Thanks for the fix. I'm very impressed with the code simplicity given the quality of the text recognition. We can see Apple has done a great job with the Vision framework.

Edited by xilopaint
Posted (edited)
4 hours ago, TomBenz said:

@zeitlings Thanks for sharing your workflow. The file size generated is huge. For 80 MB pdf file created from video screenshots, it is giving ocr pdf file with 1.15 GB size. Otherwise, I use the PDF exchange editor on windows and get a similar size after ocr. For your kind information and review

 

It's true, the increase in file size can be significant. I did some comparisons with DEVONthink's OCR for your usual text-based PDF documents, which uses Abbyy's engine in the background. In some cases, I was able to get better results in terms of file size with the workflow, so I guess it can hardly be avoided. Abbyy's in-house application can export PDFs by applying MCR compression to the images, which is great and would be awesome to have access to. But that goes way beyond what I had in mind for the workflow 😂.

 

What I find interesting is that the screenshots most likely don't have any previous text embedded in them, which should allow for a more efficient way to create the new PDF pages, and that the file size still escalates as it does. There is a difference between real-world image dimensions and device-specific metrics. Perhaps what looks like a rather modest screenshot could be a 4K ∞-DPI monstrosity? 

 

Image compression is its own subgenre, I suppose, and another rabbit hole to get lost in for sure. Maybe I'll look into it sometime in a moment of muse.

 

You may want to take a look at @xilopaint's Alfred PDF Tools to reduce the pixel density of your document or to scale the images down before attempting the OCR. For really large documents, however, I would recommend to use, and I myself will continue use, the professional tools.

Edited by zeitlings
Posted (edited)

@TomBenz I just updated the workflow to include compression. 

 

If you remember the 1.2 MB sample you sent me, it blew up to 7.9 MB uncompressed. With compression, we level off at 589 KB on my machine without butchering the file 😁. However, compressing a PDF does not always result in a smaller file size, so I've decided to keep the uncompressed PDF document around if compression does not actually produce the desired result. Let me know what happens to the 1.15 GB document!

 

I've decoupled the compression utility, which means you can now compress any PDF without having to run OCR on it. For some documents this works really well, for others it may inflate the file size even more. Which factors play a role here remains a task for future experiments to find out... There are also some internal improvements, additions to the workflow and tweaks:

 

v1.3.0 (OCR+)

  • Added PDF compression
  • Added a keyword for quick access to the workflow configuration (Alfred 5.1+)
  • Added Universal Action modifier option to apply compression to PDFs (⇧⌘)
  • Added Universal Action modifier option to open converted PDFs in the default application (⌥⇧)
  • Added a configuration option to open converted PDFs in the default application
  • Added a configuration option to specify how text should be joined when taking a snapshot
  • Added a File Action to compress PDF documents
  • Changed the modifier keys to replace a PDF and added noticeable visual queues (⌥⌘)
  • Changed the way an export strategy is specified by using a pop-up selection box
  • Improved performance
Edited by zeitlings
Posted (edited)
14 hours ago, zeitlings said:

@TomBenz I just updated the workflow to include compression. 

 

If you remember the 1.2 MB sample you sent me, it blew up to 7.9 MB uncompressed. With compression, we level off at 589 KB on my machine without butchering the file 😁. However, compressing a PDF does not always result in a smaller file size, so I've decided to keep the uncompressed PDF document around if compression does not actually produce the desired result. Let me know what happens to the 1.15 GB document!

 

I've decoupled the compression utility, which means you can now compress any PDF without having to run OCR on it. For some documents this works really well, for others it may inflate the file size even more. Which factors play a role here remains a task for future experiments to find out... There are also some internal improvements, additions to the workflow and tweaks:

 

v1.3.0 (OCR+)

  • Added PDF compression
  • Added a keyword for quick access to the workflow configuration (Alfred 5.1+)
  • Added Universal Action modifier option to apply compression to PDFs (⇧⌘)
  • Added Universal Action modifier option to open converted PDFs in the default application (⌥⇧)
  • Added a configuration option to open converted PDFs in the default application
  • Added a configuration option to specify how text should be joined when taking a snapshot
  • Added a File Action to compress PDF documents
  • Changed the modifier keys to replace a PDF and added noticeable visual queues (⌥⌘)
  • Changed the way an export strategy is specified by using a pop-up selection box
  • Improved performance

 

I'm absolutely in love with this workflow. I like everything, even the icon color tone is appealing to me. Haven't you committed the code for the compression feature @zeitlings?

 

What kind of compression are you doing, btw? Is it similar to the work of my workflow’s optimize action you mentioned earlier?

 

Edited by xilopaint
Posted
14 hours ago, zeitlings said:

@TomBenz I just updated the workflow to include compression. 

 

If you remember the 1.2 MB sample you sent me, it blew up to 7.9 MB uncompressed. With compression, we level off at 589 KB on my machine without butchering the file 😁. However, compressing a PDF does not always result in a smaller file size, so I've decided to keep the uncompressed PDF document around if compression does not actually produce the desired result. Let me know what happens to the 1.15 GB document!

 

I've decoupled the compression utility, which means you can now compress any PDF without having to run OCR on it. For some documents this works really well, for others it may inflate the file size even more. Which factors play a role here remains a task for future experiments to find out... There are also some internal improvements, additions to the workflow and tweaks:

 

v1.3.0 (OCR+)

  • Added PDF compression
  • Added a keyword for quick access to the workflow configuration (Alfred 5.1+)
  • Added Universal Action modifier option to apply compression to PDFs (⇧⌘)
  • Added Universal Action modifier option to open converted PDFs in the default application (⌥⇧)
  • Added a configuration option to open converted PDFs in the default application
  • Added a configuration option to specify how text should be joined when taking a snapshot
  • Added a File Action to compress PDF documents
  • Changed the modifier keys to replace a PDF and added noticeable visual queues (⌥⌘)
  • Changed the way an export strategy is specified by using a pop-up selection box
  • Improved performance

thank you @zeitlings I will test it over this weekend and update

Posted (edited)
On 4/27/2023 at 10:37 PM, xilopaint said:

I'm absolutely in love with this workflow. I like everything, even the icon color tone is appealing to me. Haven't you committed the code for the compression feature @zeitlings?

 

Thanks! 😁 The project is becoming more and more interesting for me as well. What I mean when I say that the compression utility is decoupled is just that I've rewritten the code internally so that it can be called independently of the OCR routine - as opposed to being an ad hoc change to the output content stream that writes the PDF to disk. It's still baked into the program. 

 

On 4/27/2023 at 10:37 PM, xilopaint said:

What kind of compression are you doing, btw? Is it similar to the work of my workflow’s optimize action you mentioned earlier?

 

I've checked out what it is you're doing with your optimize action. It looks like you're just tweaking the DPI/ PPI, right? (k2pdfopt <>  -ui- -as -mode copy -dpi <>) Since I’ve also been checking out K2pdfopt, I came across the -bpc option btw., which reduces the number of bits per color plane. Maybe you already know about it, but this looks like it could be interesting for your implementation 😄

 

I am not actually touching the DPI at the moment. I am playing around with CoreImage, Quartz Filters and Image Bitmaps. 

 

Quartz Filters are something the Preview App also uses for it's "compression", btw. You can find the macOS presets under `/System/Library/Filters`. I'm sure there is a way to apply them via the command line.

Edited by zeitlings
Posted
13 minutes ago, zeitlings said:
On 4/27/2023 at 5:37 PM, xilopaint said:

I'm absolutely in love with this workflow. I like everything, even the icon color tone is appealing to me. Haven't you committed the code for the compression feature @zeitlings?

 

Thanks! 😁 The project is becoming more and more interesting for me as well. What I mean when I say that the compression utility is decoupled is just that I've rewritten the code internally so that it can be called independently of the OCR routine - as opposed to being an ad hoc change to the output content stream that writes the PDF to disk. It's still baked into the program. 

 

I actually asked why you didn't add that piece of code to the repository.

Posted
18 minutes ago, zeitlings said:

I've checked out what it is you're doing with your optimize action. It looks like you're just tweaking the DPI/ PPI, right? (k2pdfopt <>  -ui- -as -mode copy -dpi <>) Since I’ve also been checking out K2pdfopt, I came across the -bpc option btw., which reduces the number of bits per color plane. Maybe you already know about it, but this looks like it could be interesting for your implementation 😄

 

While it can often be used this way, the optimize action is not really intended to compress PDF files, but rather to improve the readability of low-quality scanned PDFs.

Posted
9 minutes ago, xilopaint said:

I actually asked why you didn't add that piece of code to the repository.

 

The piece of code you're referring to isn't very useful on its own, and I haven't uploaded any of my more complex Xcode projects yet. There are various reasons why I haven't shared them at the moment, but I appreciate your interest.

Posted
2 minutes ago, zeitlings said:

The piece of code you're referring to isn't very useful on its own, and I haven't uploaded any of my more complex Xcode projects yet. There are various reasons why I haven't shared them at the moment, but I appreciate your interest.

 

It's up to you to decide what to do with your code, as long as you're transparent. The problem here is that you use a MIT license. When it comes to Alfred's workflows, transparency is important for security reasons, of course.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...