How to map visible text indices to their location in an HTML tree

I am trying to implement a web-based rich text editor that will automatically decorate the user’s text while he’s typing (think spellcheck).

The issue is that the server only processes raw text, and returns annotations with their index + length in the raw text.

So the complete flow must look like :

When spellcheck routine triggers, it converts the contents of the HTML structure into raw text.
Query the server for spellcheck annotations.
From the returned indices, find out the corresponding HTML portion and surround it with underline tags.

So I first tried Rangy and especially the TextRange module. But I am getting inconsistent results, the generated string for step 1 is hard to predict, and sometimes incorrect (see example below).

<!-- Rangy's TextRange.text() yields "foo bar" while the user sees "foobar" -->
<span>foo</span>
bar

I’m looking for a solution that would be quite robust, that can handle unicode characters, words that are cut in middle by a tag, or any other weird HTML structure.

FYI I am using Pell rich editor but the problem is the same with any contenteditable-based editor, and if another one solves this poblem I will happily switch.

What’s the best way to achieve this goal?