Jump to content

AlfredOCR - Optical Character Recognition


Recommended Posts

15 minutes ago, xilopaint said:

When it comes to Alfred's workflows, transparency is important for security reasons, of course.

 

Sure, I completely agree and understand the concern. The binaries are signed with my personal credentials, which at least means that if something sinister were to go on, I could hardly escape the blame. I would prefer to have them notarized, but alas, Apple charges for that, and since this is just a hobby, I've decided not to do so for now. If that isn't enough to inspire confidence, I won't blame anyone, but ... 🤷‍♂️

 

As for the MIT license being a problem here, I don't think it's a problem at all: The software is provided "as is". 

Link to comment
6 hours ago, zeitlings said:

I would prefer to have them notarized, but alas, Apple charges for that

 

Authority=Apple Development: pat****@*****.com (W2VCU4XR6K)
Authority=Apple Worldwide Developer Relations Certification Authority
Authority=Apple Root CA

 

Doesn’t the codesign output above indicate that you’re already enrolled in the Apple Developer Program, which grants you the privilege of notarizing the app without additional costs?

Link to comment
2 minutes ago, zeitlings said:

 

Nope

 

I know there’s an annual cost for the enrollment. My question was about the codesign output for your binary:

 

Authority=Apple Development: pat****@*****.com (W2VCU4XR6K)
Authority=Apple Worldwide Developer Relations Certification Authority
Authority=Apple Root CA

 

I thought this kind of output implied that you were already enrolled, which means you have already paid for the annual membership and are able to notarize binaries without any additional costs. If it's not true, what kind of certificate is this?

Link to comment
1 hour ago, xilopaint said:

Never mind, I just managed to do it by configuring the "Signing and Capabilities" section in Xcode.

 

Ah, yes, for the bare minimum, checking "Automatically manage signing" and selecting a Team should be enough. To have a Command Line Tool type project signed and populated with some metadata, I manually create an Info.plist file for the project and make sure it is embedded in the binary via Build Settings. 

 

infoplist.png.1e14771f237219804f25cf615bb82bc2.png

Link to comment
  • 4 weeks later...
On 3/2/2023 at 7:27 PM, zeitlings said:

Alfred OCR

I noticed that Apple's Vision framework finally produces some usable results.

This means: No external dependencies are required to perform the OCR.

 

Download On Github

 


  ocr.png.065f1bf3ffaeba1388e3dcdf944420f0.png Alfred OCR Light

 

The workflow allows you to copy text from images using optical character recognition.

Take a snapshot with your mouse or trackpad to automatically copy the recognized text to the clipboard. 

 


 ocr+.png  Alfred OCR+

 

The workflow allows you to copy text from images, or to convert PDF files into searchable PDF documents
using optical character recognition, and to apply compression to PDF documents.

 

1 / Snapshot
Take a snapshot with your mouse or trackpad to automatically copy the recognized text to the clipboard. 

  • Default shortcut: ⌘+⇧+6 
  • Default keyword: ocr

 

2 / PDF Document

  • To convert a PDF into a searchable PDF document, pass it to the workflow’s Universal Action
    • To compress the resulting PDF, pass the source document on while pressing the ⌘+⇧ keys.
    • To open the resulting PDF, pass the source document on while pressing the ⌥+⇧ keys.
    • To force the replacement of a source document, pass it on while pressing the ⌥+⌘ keys.
  • To compress a PDF without performing OCR, pass it to the Compress PDF Document File Action.
  • To view the progress tracker, re-enable the workflow with the Keyword (default: ocr).

preview_ocr1.png

 preview_ocr2.png

 

Configuration

To open the OCR Workflow Configuration, type the keyword preceeded by a colon (default: ocr).

preview_ocr3.png

 

Languages

Specify the languages you want the OCR process to consider by adding the appropriate RFC-5646 language tag. The following languages (and regions) are currently supported: en-US, fr-FR, it-IT, de-DE, es-ES, pt-BR, zh-Hans, zh-Hant, yue-Hans, yue-Hant, ko-KR, ja-JA, ru-RU, uk-UA


Explanations:

  • en-US: (English as used in the United States)
  • de-DE: (German as used in Germany)
  • fr-FR: (French as used in France)
  • it-IT: (Italian as used in Italy)
  • es-ES: (Spanish as used in Spain)
  • pt-BR: (Portuguese as used in Brazil)
  • ko-KR: (Korean as used in South Korea)
  • uk-UA: (Ukrainian as used in Ukraine)
  • ja-JA: (Japanese as used in Japan)
  • ru-RU: (Russian as used in Russia)
  • yue-Hant: (Traditional Cantonese)
  • yue-Hans: (Simplified Cantonese)
  • zh-Hant: (Traditional Chinese)
  • zh-Hans: (Simplified Chinese)

 


Change Log

 

v1.3.0 (OCR+)

  • Added PDF compression
  • Added a keyword for quick access to the workflow configuration (Alfred 5.1+)
  • Added Universal Action modifier option to apply compression to PDFs (⇧⌘)
  • Added Universal Action modifier option to open converted PDFs in the default application (⌥⇧)
  • Added a configuration option to open converted PDFs in the default application
  • Added a configuration option to specify how text should be joined when taking a snapshot
  • Added a File Action to compress PDF documents
  • Changed the modifier keys to replace a PDF and added noticeable visual cues (⌥⌘)
  • Changed the way an export strategy is specified by using a pop-up selection box
  • Improved performance

 

v1.2.3 (OCR+)

  • Fixed an error thrown due to missing workflow cache directory
  • Fixed snapshot tasks queuing up if they are started before the previous task has finished
  • Added explicit opt-out of Snapshot tasks while PDF conversions are running

v1.2.2 (OCR+)

  • Fixed low contrast output images produced for some PDF documents
  • Added progress tracker for the document recognition process
  • Added three options to handle document output: export to location, copy to same location and replace. Priority behavior: Replace > Copy > Export.
  • Added new icons
  • Improved output file size for PDF documents that do not already contain text

 


v1.1.0 (OCR Light)

  • Updated configuration and documentation
  • Added new icon

 

Hi zeitlings, could you, please, add an Arabic OCR? it will help me so much. 💐

Link to comment
  • 3 weeks later...

@zeitlings, the workflow is broken for me:
 

[21:00:20.969] OCR[Universal Action] Processing complete
[21:00:20.987] OCR[Universal Action] Passing output '/Users/xxx/Desktop/sample.pdf' to Arg and Vars
[21:00:20.989] OCR[Arg and Vars] Processing complete
[21:00:20.990] OCR[Arg and Vars] Passing output '' to Run Script
[21:00:21.014] OCR[Run Script] Processing complete
[21:00:21.016] OCR[Run Script] Passing output 'OCR Failure: Recognizer init failure (ocr)
' to Conditional
[21:00:21.017] OCR[Conditional] Processing complete
[21:00:21.017] OCR[Conditional] Passing output 'OCR Failure: Recognizer init failure (ocr)
' to Debug
[21:00:21.018] OCR[Debug] 'OCR Failure: Recognizer init failure (ocr)
', {
  compress = "0"
  DEV = ""
  export_path = "/Users/xxx/Desktop"
  export_strategy = "exp.same"
  gristle = "nl"
  key = "ocr"
  languages = "en-US, de-DE, fr-FR"
  open_pdf = "1"
  pdf_path = "/Users/xxx/Desktop/sample.pdf"
  revision = "3"
}
[21:00:21.019] OCR[Debug] Processing complete
[21:00:21.019] OCR[Debug] Passing output 'OCR Failure: Recognizer init failure (ocr)
' to Play Sound
Edited by xilopaint
Link to comment
  • 4 weeks later...
On 8/16/2023 at 12:45 PM, xilopaint said:

Any news?

 

Hey, yeah. The program has undergone quite a few internal changes and I am not 100% happy with all of the results of the alterations.

However, I'm cobbling together a version that allows to fall back to the previous (v1.3.0) state if something unexpected should happen.

 

On 8/17/2023 at 6:57 AM, Afoan said:

There were a way i could use the arabic language.. now i can't remember how i did it. Can anyone help

 

Hey @Afoan, since Apple doesn't support Arabic right now I don't think it ever worked with this workflow.

Edited by zeitlings
Link to comment

 

OCR v1.4.0

  • Added bitmap compression and compression facets
  • Added embedding strategy options
    • "Word Granularity" attempts to embed the text word by word
    • "Line Oriented" is the strategy previously used (use if you encounter unexpected results)
  • Improved OCR embedding granularity
  • Fixed 'Recognizer init' error

 

The new default is an improved embedding strategy that tries to align every word with its underlying picture. This works best with straight forward text documents, but may sometimes yield unexpected results. If that is the case for you, you may want to opt for the "Line Oriented" embedding strategy, which was the default up until now. The compression facet "Aggressive (Quartz)" was the default until now.

 

NB: Please note that only compressing PDF documents that already contain text with any bitmap method may result in inaccuracies with respect to the position of embedded words.

Link to comment

@zeitlings The software got stuck on my first attempt with the “Word Granularity” strategy, so I had to kill the ocr process and delete the cache file to make it work with the “Line Oriented” strategy, otherwise the workflow will be permanently unusable. To fix the bug you need to ensure the cache is cleared or rewritten at the start of each run.

Edited by xilopaint
Link to comment
3 hours ago, zeitlings said:

@xilopaint, is this happening with all PDFs you are testing?

 

I just tried with a second file. This time the process has been completed with the “Word Granularity” strategy but the OCR quality with "Line Oriented" is way better.

 

3 hours ago, zeitlings said:

Is it possible that an old progress log file was still around from before updating?

 

It doesn't matter. After clearing the cache I always face the same issue with “Word Granularity” if I run the workflow with the first PDF file.

Edited by xilopaint
Link to comment

Could you share a - perhaps truncated - version of the PDF file where this happens? Maybe I can trace what happens under the hood and where things go wrong with it. Thanks for making the tests!

 

May I ask what kind of PDFs you usually scan? (My primary test source are journal articles).

Edited by zeitlings
Link to comment
  • 1 month later...
  • 4 weeks later...
On 10/15/2023 at 5:11 AM, TomBenz said:

In OCR Light, please consider adding universal file action to process OCR image file and copy the content to clipboard.

Added to version 1.2.0

image.thumb.png.7bac92e7e36e7c286731143aac7949ba.png

 

OCR Light v1.2.0

  • Add File Action to extract text from images
  • Fix for macOS Sonoma (Compiles the script en passant to compensate for the failure to link objc symbols on macOS 14).
Edited by zeitlings
Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...