Regex to only add missing quotes to HTML attributes?

The Missing Attribute Quotes Problem:

I have HTML emails with clearly copy/pasted code where the original… authors, failed to add double quotes around the values of some HTML attributes. This leads to the parser disregarding those attributes and breaking the emails.

Examples:

  • border=0
  • cellpadding=0
  • cellspacing=0
  • width=480

An example string to be fixed:

<table border=0 style="color: red;" title='Hello'></table>

An example of the string after being fixed:

<table border="0" style="color: red;" title='Hello'></table>

Determining the Regular Expression Logic

I know how to find and replace one or more consecutive characters. What I do not know is how to find and replace two sets of strings in a single regular expression and the whole grouping part:

$string = str_ireplace(array('_','--'),array('-','-'), '$string');

I believe the pattern logic should be:

  1. First if < been encountered but not >? (we are within an element).
  2. Second if the first space is encountered (the element name).
  3. Third if = has been encountered but also the next immediate character is not a quote.
  4. Fourth if after (.*?) (I think, for the value itself) is a space encountered?

Some Attempts:

I’ve been going through pages here on Stack and other websites for a few hours though no one, that I’ve come across, has even bothered to work out the logic forget about working on syntax. So with the logic worked out I have some small parts figured out:

  • preg_replace('/<(.*?)( )/u', '', $s) will replace the element (regardless of it’s tag name) but delete everything up to the first attribute. So I don’t know how to write “do nothing here but make sure it exists” so we’re only applying the expression to attributes assuming the HTML author at least got the < and > formatting correct.
  • preg_replace('/^=(.*?)( )$/u', '', $s) was my attempt to ^ start a match at = and end $ it with a space however this just deletes the entire string.

I can’t get the second part figured out. Then I presume there is a way to group the two individual expressions together.

Question:

How do I update my regular expression to append double quotes to HTML attributes only if they’re missing quotes?

Clarifications:

  1. I absolutely do not want to do anything else with regular expressions here. I believe my goal is reasonably simple (as opposed to something very expansive like this). I figured that the pattern is established enough and possibly not too complex for others with greater experience with regular expressions than I. If I can’t figure this out or get enough help I can just write my own parser just to resolve this issue.
  2. I attempted to use $dom = new DOMDocument; $dom->loadHTML($html); $xml = trim($dom->saveXml($dom)); and that did add the missing quotes, but this is HTML from email that triggered tons of errors from other malformed syntax problems so ultimately it was an interesting though non-viable approach.
  3. To keep it as simple as possible I’m content to ignore single quotes.
  4. If this gets figured out I should be able to easily adapt this to make a second version to replace single quotes with double quotes (or vice-versa) for parsing consistency.