I am augmenting content of a book in ePub format. My task is to make all single-letter words not to show at the end of a line in paragraphs (<p>) or header (<h1> to <h6>) elements.
To do
This is what we must do:
- Use a php parser or php script to do the job.
- Target a set of single-letter words declared earlier.
- The set contains the following characters:
u U i I o O a A z Z –(this set is for Polish language) - In a single html file there might be hundreds instances, multiple instances for each word.
Example of an input and output string/file
Input file
<h4>Title a title I title O title z title</h4>
<p>A Lorem ipsum i dolor sit – amet, w consectetur z adipiscing elit. Quisque tincidunt nisi. Z nec arcu convallis pulvinar.</p>
Output file
<h4>Title a title I title O title z title</h4>
<p>A Lorem ipsum i dolor sit – amet, w consectetur z adipiscing elit. Quisque tincidunt nisi. Z nec arcu convallis pulvinar.</p>
- the desired code should be CASE SENSITIVE
- In case of the
–, some publishers choose to force it to be at the end of a line or at the beginning of a line of text. The code should allow the user to switch between the two standards.
List of sub-jobs the code must do:
- A note: I am a beginner in this. It is enough that you help me to write a solution for the first 1-2 letters. After seeing your solution, I will do the rest
- change
w→w  - change
W→w  - change
u→w  - change
U→w  - change
i→w  - change
I→w  - change
o→w  - change
O→w  - change
a→w  - change
A→w  - change
z→w  - change
Z→w  - change
–→ –
Just a note
Currently I just do a find-and-replace. This crude method works and was acceptable for a single book per year. However, now I am about to do dozens.