Jump to content
jmm28260

Workflow to identify language in pdf

Recommended Posts

It's still showing that you've set the input folder to one thing, and the output folder to another.
I can even see you edited the path of the directory variable to lead to a second folder called 2. Renommer


When I assigned the variable directory back to both the input and the output folders, then changed the path of directory to point to my test folder where I copied in a bunch of PDFs, the workflow ran as expected.


My suggestion is that you first try and use the workflow exactly as I've laid out, and then you can play around with it a bit more when you are more familiar with what does what, etc.


If you still have problems, you need to be very specific, and pedantic, about where the very, very, first unexpected occurrence takes place.  You're using Mojave (I'm on High Sierra), so it's possible your Automator has more bugs in it (it sounds to me from the things I've heard that Mojave has been a nightmare for AppleScript and Automator).  So, look at each action individually - for example, the first action is a Finder action, and they are notoriously buggy in some situations.  Therefore, make sure it is actually finding your PDF files successfully (all of them).


The script portion of the workflow appears to be working fine, even on your end.  I know it's not producing the output you want yet, but there's no error being thrown and it completes its run.  To me, it suggests that the problem is in the actions preceding it that are not set correctly, and not producing the output necessary.


Worst comes to the worst, I can do a re-write of the whole thing.  What I would do is replace the Finder action with another AppleScript action instead, and get that to retrieve the PDF files reliably.  Then I'd recode the main AppleScript action to use less Objective-C and more AppleScript, as Objective-C can be quite strict with security protocols, and it's possible (but unlikely) that it's not permitting Automator to make changes to your filesystem.


But you do your bit first.  Then give me a couple of days and I'll see what's going on.

Share this post


Link to post

Indeed, Mojave might be the source of the problem; since I updated my Os, several of my Automator workflows do not work anymore.

But in this case, the workflow rolls fine; all the actions are performed and green-checked. The only issue that I see, is that the Applescript gets a .txt file and results with the same file and not a .pdf file with the lang appended to the name. My guess is that Mojave does not accept and treat the Applescript as it should.

I might have to wait for Apple to correct that problem with its future updates.

However, I really want to thank you for all the time you have devoted to my problem.

Share this post


Link to post

Problems between Mojave and Applescripts are cited all over the net.

I tried to find some explanations on the web and here are 2 possibilities that might help:

I also found a possible answer that was suggested on a forum and that might work:

 

This worked for me, but it requires converting to an app first!

 

  • 1. Open the app's plist file is xcode
  • 2. add row (from right click context menu)
  • 3. in ‘key’ column select ‘Privacy - AppleEvents Sending Usage Description’ from the drop down menu (you need to scroll down)
  • 4. add ‘This script needs to control other applications to run.' in the value column.
  • 5. Build the application again... it should now prompt for accessibility and automation permissions.

Hope it helps...

Share this post


Link to post

@jmm28260 Sorry for the delay—been unwell.  (Also, it's a good idea to tag the person to whom you're replying in your post using the @ symbol followed by their username, otherwise they won't necessarily be notified by your reply; or, at least, I'm not, despite having the option selected).

 

On 1/8/2019 at 5:57 PM, jmm28260 said:

I tried to find some explanations on the web and here are 2 possibilities that might help:

 

On 1/8/2019 at 5:57 PM, jmm28260 said:

I also found a possible answer that was suggested on a forum and that might work:

 

OK, well done for finding those bits and pieces.  I'll leave those for you to attempt to implement if you really feel any of them are the problem.  I don't have Mojave and I don't intend to, so those recommendations aren't really something I can play around with.


In the meantime, I rewrote the Automator workflow from scratch.  The workflow itself has been dramatically simplified because the AppleScript portion has taken over all of the functionality.  I did this for a couple of reasons: 1) I wanted to reduce the number of workflow actions in which problems might potentially arise.  There are now only 3 actions, and none of them need to be touched or edited, because there are no directory selections to be made for input and output file sources that was a source of contention for me.  The only part of the workflow that requires customising is the path stored in the directory variable, and this can only be edited in the bottom pane where variables are stored in Automator:

 

77737035_ScreenShot2019-01-18at21_52.thumb.jpg.1239c9d03ecda29bd501c1e450d5af88.jpg

 

2) Being now entirely script based, it means one can also just copy and paste the AppleScript into Script Editor and run it from in there.  This will be a useful thing to do particularly if the Automator workflow fails on you.  Script Editor will be a lot more helpful with its error messages that will make pinpointing any scripting issues a bit easier.  Also, if it works in one environment, but not the other, then that's an entirely different problem that I doubt I'll be able to help with.


Using this new method, there are no new files created at any point, so no text files will appear in the PDF folder or on the desktop.  The only effect the script will (should) have run from either environment is to rename the PDF files by appending the language code to the end of the name, then revealing those files in Finder.  The script doesn't return any value, so don't worry if the workflow results appear empty at the end.


Here's the AppleScript:

 

use framework "Foundation"

use framework "Quartz"

use scripting additions

 

property this : a reference to current application

 

property NSArray : a reference to NSArray of this

property NSFileManager : a reference to NSFileManager of this

property NSLinguisticTagger : a reference to NSLinguisticTagger of this

property NSMutableDictionary : a reference to NSMutableDictionary of this

property NSString : a reference to NSString of this

property PDFDocument : a reference to PDFDocument of this

 

property samples : 4 -- The (maximum) number of pages to sample for text

 

 

on run filepaths

        set [filepaths] to filepaths & {null}

        

        if class of filepaths = script or filepaths = {} then set ¬

                filepaths to [(choose file of type ["com.adobe.pdf"] ¬

                with multiple selections allowed), null]

        set [filepaths, null] to filepaths

        

        if class of filepathslist then

                set directory to POSIX file (POSIX path of filepaths) as alias

                set filepaths to filterContents at directory by ["pdf", "PDF"]

        end if

        

        set PDFs to {}

        repeat with PDFPath in filepaths

                set lang to probableLanguageForPDF at PDFPath

                set end of PDFs to my (stick on "_" & lang to PDFPath)

        end repeat

        

        tell application "Finder"

                reveal the PDFs

                activate

        end tell

end run

 

# filterContents

#   Creates a list of alias objects pointing to files inside a specified

#   +directory that have a file extension appearing in the +extension list

to filterContents at directory by extension as list

        local directory, extension

        

        set directory to directory's POSIX path

        

        script dir

                property filenames : ((NSFileManager's defaultManager()'s ¬

                        contentsOfDirectoryAtPath:directory ¬

                                |error|:(missing value))'s ¬

                        pathsMatchingExtensions:extension) as list

                property filepaths : {}

        end script

        

        repeat with fp in dir's filenames

                try

                        set end of filepaths in dir to ¬

                                POSIX file (((NSString's ¬

                                        stringWithString:directory)'s ¬

                                        stringByAppendingPathComponent:fp) ¬

                                        as text) as alias

                end try

        end repeat

        

        dir's filepaths

end filterContents

 

# add

#   Appends a suffix to the filename (without extension) of the file at the

#   specified path, without altering the file extension

to stick on suffix to fp as text

        local fp, suffix

        

        set filename to null

        

        tell (NSString's stringWithString:(fp's POSIX path)) to if ¬

                false = ((the lastPathComponent()'s ¬

                stringByDeletingPathExtension()'s hasSuffix:suffix)) ¬

                as boolean then set filename to ¬

                (((the lastPathComponent()'s ¬

                        stringByDeletingPathExtension()'s ¬

                        stringByAppendingString:suffix))'s ¬

                        stringByAppendingPathExtension:(the ¬

                                pathExtension())) as text

        

        tell application "System Events" to tell the item named fp

                if filename = null then return it as alias

                set dir to its container

                set its name to filename

                return the item named filename in dir as alias

        end tell

end stick

 

# probableLanguageForPDF

#   Obtain the most likely language of a PDF file based on sampling a small

#   number of its pages and returning the most commonly detected language code

on probableLanguageForPDF at PDFPath as text

        local PDFPath

        

        set PDFFileURL to POSIX file (PDFPath's POSIX path) as alias

        

        set PDF to PDFDocument's alloc()'s initWithURL:PDFFileURL

        set PDFTitle to the PDF's documentAttributes() as text

        set N to the PDF's pageCount() as integer

        

        -- Ignore first and last page unless they are the only pages

        set PDFPageNumbers to array(0, N - 1)

        set a to item 2 of (PDFPageNumbers & {0})

        set b to item -2 of ({N - 1} & PDFPageNumbers)

        if a > b then set [a, b] to [b, a]

        

        set langs to NSMutableDictionary's dictionary()

        

        -- The language of the PDF's title

        set PDFTitleLang to (NSLinguisticTagger's ¬

                dominantLanguageForString:PDFTitle)

        langs's setValue:1 forKey:PDFTitleLang

        

        -- Select only a small sample of pages

        -- to obtain a language for each

        repeat with i from a to b by N div samples + 1

                set PDFPage to the (PDF's pageAtIndex:i)

                set PDFPageText to PDFPage's |string|()

                set PDFPageLang to (NSLinguisticTagger's ¬

                        dominantLanguageForString:PDFPageText)

                set [x] to references in {langs's valueForKey:PDFPageLang} & {0}

                (langs's setValue:((x as integer) + 1) forKey:PDFPageLang)

        end repeat

        

        -- The most common language identified

        set lang to the last item of (langs's ¬

                keysSortedByValueUsingSelector:"compare:")

        lang as text

end probableLanguageForPDF

 

# array()

#   Generate a list of consecutive (ascending) integers between +a and +b

on array(a as integer, b as integer)

        local a, b

        

        if a > b then set [a, b] to [b, a]

        

        script |integers|

                property list : {}

        end script

        

        repeat with i from a to b

                set the end of the list of |integers| to i

        end repeat

        

        return the list of |integers|

end array

 

 

Finally, here's the Automator workflow file, which any future readers for whom this link will be broken will have no problems creating themselves by copy-n-pasting the above script.  It will work as a single AppleScript action inside Automator, or you can copy the action workflow depicted in the image above that simply sets a directory.

 

https://transfer.sh/4Ge72/Append Language To Name of PDF File.workflow.zip

 

@jmm28260, report how things turn out with this iteration, either from Automator, or from Script Editor, or from both.  If Automator fails, report how and why, but then definitely run from within Script Editor to confirm the error is script-generated rather than app-generated, and to get a more focused error report.

Edited by CJK

Share this post


Link to post
On 1/19/2019 at 12:18 AM, dfay said:

@CJK thanks for your work on this! Looking forward to having a bit of time to work through how you did the Applescript / ObjC integration.

 

Great.  Always happy to hear thoughts/comments/etc.

Share this post


Link to post

@CJK Thanks so much for your efforts. I am impressed by your work and thank you for your time.

I have tried using the Script editor in Mojave with your Applescript and here is the error message I get :

 

1608433632_Capturedecran2019-01-20a13_11_08.thumb.png.b1d16eea0303a006657a5ce34a4c68e2.png

 

And I have used the Automator you provided following your instructions precisely, specified a definite directory to look into, where I had put a pdf file, and got the following message:

 

L’action « Exécuter un script AppleScript » a rencontré une erreur : « *** -[__NSDictionaryM setObject:forKey:]: key cannot be nil »

 

Looks like Mojave is reluctant to accept some instructions.

 

 

 

Capture d’écran 2019-01-19 à 18.34.02 copie.png

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×