Workflow to reformat paragraph or combine lines of text

jonteamere · October 27, 2016

I'm looking for a workflow that can combine multiple lines of plain text into a single line. I guess it'd be similar to a 'reformat paragraph' function.

If a workflow would be difficult, are there any plain text apps that can handle that function?

deanishe · October 27, 2016

Run Script, Language = /usr/bin/python with input as argv

Script:

from __future__ import print_function
import re
import sys

print(re.sub(r'\n+', ' ', sys.argv[1]), end='')

That will replace one or more consecutive newlines with a space.

Connect to whichever inputs/outputs you need.

jonteamere · October 27, 2016

Works like a charm! Thank you, Sir.

jonteamere · October 29, 2016

Suddenly stopped working. I don't know what's changed.

@deanishe are there any fixes or alternative ways?

deanishe · October 29, 2016

How am I supposed to tell what's wrong from "Suddenly stopped working"?

Provide the input, the expected output and the actual output. And anything shown in the debugger.

jonteamere · October 31, 2016

@deanishe I think I figured it out. It came down to \n vs. \r (which appeared to be "newlines"). Is there a way to change the code to recognize either newlines or carriage returns?

Also, at the risk of annoying you...

Say I have a document like the example below. It consists of groups of text separated by a empty line. So, TEXT + \n \n + TEXT repeated. I'd be nice to highlight all content in the document and have the script search and reformat just the TEXT portions (if they contained carriage returns or new lines) while preserving the empty line between each 'paragraph' of text.

Quote

Nonetheless, while the true burden may be underestimated, the economic burden from both a sufferer’s and societal perspective is profound.

One database analysis found that direct endometriosis- related costs were considerable and appeared driven by hospitalizations

An actuarial analysis revealed that women with endometriosis incur total medical
costs that are, on average, 63% higher than medical costs for the average woman
in a commercially insured group.

Others reported being “totally incapacitated” and even dismissed from or left their jobs due to symptoms.

Most recent data indicate that the total annual burden of endometriosis-associated symptoms in the United States has reached a staggering $119 billion.

I appreciate any help that you can provide.

deanishe · October 31, 2016

2 minutes ago, jonteamere said:

@deanishe I think I figured it out. It came down to \n vs. \r (which appeared to be "newlines"). Is there a way to change the code to recognize either newlines or carriage returns?

Easy peasy:

from __future__ import print_function
import re
import sys

# Replace any combination of \n and/or \r with a single space
print(re.sub(r'[\r\n]+', ' ', sys.argv[1]), end='')

2 minutes ago, jonteamere said:

Say I have a document like the example below. It consists of groups of text separated by a empty line. So, TEXT + \n \n + TEXT repeated. I'd be nice to highlight all content in the document and have the script search and reformat just the TEXT portions (if they contained carriage returns or new lines) while preserving the empty line between each 'paragraph' of text.

I appreciate any help that you can provide.

from __future__ import print_function
import sys

# Split text into lines and strip any whitespace at line ends
lines = [l.strip() for l in sys.argv[1].strip().splitlines()]

# Collect lines into paragraphs
paras = []
buf = []
for l in lines:
    if not l:  # empty line, i.e. new paragraph
        s = ' '.join(buf).strip()
        if s:
            paras.append(s)
        buf = []
    else:
        buf.append(l)

# Add last paragraph if there is one
s = ' '.join(buf).strip()
if s:
    paras.append(s)

print('\n\n'.join(paras), end='')

jonteamere · October 31, 2016

@deanishe Brilliant! Just wonderful. Thanks so much.

jonteamere · August 10, 2017

Hey, @deanishe

So, tried to use the above script for a rather long 450,000+ character plaintext file containing my notes. Selected all text → triggered workflow → got an error (attached below). Is there a character or paragraph limit? It been working like a dream since you wrote it.

Let me know if I can provide you with any other info.

deanishe · August 10, 2017

That's far too much data to pass to a script.

Sign In

Workflow to reformat paragraph or combine lines of text

Recommended Posts

jonteamere

Link to comment

deanishe

Link to comment

jonteamere

Link to comment

jonteamere

Link to comment

deanishe

Link to comment

jonteamere

Link to comment

deanishe

Link to comment

jonteamere

Link to comment

jonteamere

Link to comment

deanishe

Link to comment

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity