Jump to content

Workflow to reformat paragraph or combine lines of text


Recommended Posts

@deanishe I think I figured it out. It came down to \n vs. \r (which appeared to be "newlines"). Is there a way to change the code to recognize either newlines or carriage returns?

 

Also, at the risk of annoying you...

 

Say I have a document like the example below. It consists of groups of text separated by a empty line. So, TEXT + \n \n + TEXT repeated. I'd be nice to highlight all content in the document and have the script search and reformat just the TEXT portions (if they contained carriage returns or new lines) while preserving the empty line between each 'paragraph' of text.

 

Quote

 

Nonetheless, while the true burden may be underestimated, the economic burden from both a sufferer’s and societal perspective is profound.

 

One database analysis found that direct endometriosis- related costs were considerable and appeared driven by hospitalizations

 

An actuarial analysis revealed that women with endometriosis incur total medical
costs that are, on average, 63% higher than medical costs for the average woman
in a commercially insured group.

 

Others reported being “totally incapacitated” and even dismissed from or left their jobs due to symptoms.

 

Most recent data indicate that the total annual burden of endometriosis-associated symptoms in the United States has reached a staggering $119 billion.

 

 

I appreciate any help that you can provide.

Link to comment
2 minutes ago, jonteamere said:

@deanishe I think I figured it out. It came down to \n vs. \r (which appeared to be "newlines"). Is there a way to change the code to recognize either newlines or carriage returns?

 

Easy peasy:

from __future__ import print_function
import re
import sys

# Replace any combination of \n and/or \r with a single space
print(re.sub(r'[\r\n]+', ' ', sys.argv[1]), end='')

 

2 minutes ago, jonteamere said:

Say I have a document like the example below. It consists of groups of text separated by a empty line. So, TEXT + \n \n + TEXT repeated. I'd be nice to highlight all content in the document and have the script search and reformat just the TEXT portions (if they contained carriage returns or new lines) while preserving the empty line between each 'paragraph' of text.

 

I appreciate any help that you can provide.

 

from __future__ import print_function
import sys

# Split text into lines and strip any whitespace at line ends
lines = [l.strip() for l in sys.argv[1].strip().splitlines()]

# Collect lines into paragraphs
paras = []
buf = []
for l in lines:
    if not l:  # empty line, i.e. new paragraph
        s = ' '.join(buf).strip()
        if s:
            paras.append(s)
        buf = []
    else:
        buf.append(l)

# Add last paragraph if there is one
s = ' '.join(buf).strip()
if s:
    paras.append(s)

print('\n\n'.join(paras), end='')

 

Link to comment
  • 9 months later...

Hey, @deanishe

 

So, tried to use the above script for a rather long 450,000+ character plaintext file containing my notes. Selected all text → triggered workflow → got an error (attached below). Is there a character or paragraph limit? It been working like a dream since you wrote it.

 

Let me know if I can provide you with any other info.

Screenshot_2017-08-09 20.30.25_D6sAsD.png

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...