
: Join broken paragraphs Regex Sigil I'm trying to edit some xhtml on Sigil. With the command "<p>([a-z]) I'm able to find all paragraphs that begin with lower case. That tells me that they
I'm trying to edit some xhtml on Sigil.
With the command "<p>([a-z]) I'm able to find all paragraphs that begin with lower case. That tells me that they shouldn't be separate from the previous one. It's just a conversion issue.
What should I do to delete both the < p> from that paragraph and the </p> from the previous one in order to join the two blocks of text into one single paragraph??
It looks something like this:
<p> ... that is why relationships</p>
<p> are not what they should be.
And it should be:
<p> that is why relationships are not what they should be.</p>
Free books android app tbrJar TBR JAR Read Free books online gutenberg
More posts by @Miguel

: Stretching an image to fit the screen I am converting a book for use on a Kindle 7. I want certain images to display as large as possible while keeping the original aspect ratio. I can get
2 Comments
Sorted by latest first Latest Oldest Best
This worked and you can play with it if you want. Please copy it before changing it. regex101.com/r/gO0zG0/1.
A regex replace is like this in Perl: $s=~s/OLDPATTERN/NEWPATTERN/g;
The regex was: /</p>nn<p>//g;
Replace the first part between the first two // with nothing. The n SHOULD be properly initialized by the regex engine for your operating system (you shouldn't have to do a thing), because it has different values for different OSes. But if that n doesn't work, please let me know. You might have to try lr or rl.
Free books android app tbrJar TBR JAR Read Free books online gutenberg
Sigil's regex engine is kind of fussy; I'm not sure that the anonymous answer above would work.
I answered this on Quora, but I'll post the answer again here, because it seems like that would be helpful:
Ah. Let me guess. You're converting from a PDF. You have my sympathy. :-)
I've done this. Here's the search expression I used:
([a-z]|,|;)</p>s+<p>
That found paragraphs that ended without a period, exclamation point, question mark, or right double quotes. (Since I'm assuming you're working from a PDF of a print document, I didn't include in the query all of the other possible marks, since, for instance, a left double quotation mark is extremely unlikely to come at the end of a text line.)
The replace expression was simply this:
1
Note that there's a space after the wildcard, so that you don't end up smooshing words together.
That should get rid of the unwanted paragraph breaks and splice your text together properly.
Unfortunately, if you've got letters reproduced in the book or some other text format that has paragraphs ending in commas, semicolons, or lower case letters, you'll have to do Replace/Find and go case by case rather than Replace All.
Free books android app tbrJar TBR JAR Read Free books online gutenberg