Page 1 of 1

Unicode text search strategies

Posted: Mon Nov 21, 2005 1:40 pm
by martindholmes
Hi there,

Since Unicode text searching isn't implemented yet, I've been trying to create a workaround that involves searching a plain text version of the text and setting the selection with RVLinear:

RVSetSelection(MyRichViewEdit, HitLocation, Length(SearchString));

This is working OK except where there are bulletted or numbered lists in the text; the bullets and numbers seem to be missing from the plain text version. I'm getting the plain text copy by selecting everything in the control, then doing GetSelTextW. Is there any way I could get a plain-text copy that would include the extra bullet/numbering characters, so my search offsets are correct?

All help appreciated,
Martin

Posted: Mon Nov 21, 2005 2:50 pm
by Michel
Hi Martin,

There exists the GetAllText() function (in the RVGetText unit), but I have no idea what it does to lists, sorry.

Michel

Posted: Tue Nov 22, 2005 1:06 pm
by martindholmes
Thanks for the suggestion -- I tried it, but RVGetAllText (in RVGetTextW) returns the same -- a widestring, but without the list markers. I'm looking at other code in the same unit, though.

Sergey, do you know any way to do this?

Cheers,
Martin

Posted: Tue Nov 22, 2005 1:39 pm
by Michel
I have only one scary idea: iterate through all RV Items yourself, get the text of each Item, search in it, and so on. You'd obviously have to do so recursively - to take care of tables/cells. Come to think of it, it's not too scary.
Michel

Posted: Tue Nov 22, 2005 3:16 pm
by Sergey Tkachenko
As far as I remember, RVLinear's functions treat list markers like any other non-text item (except for tables), i.e. they are treated like space character.
It should not cause problem for RVSetSelection, if its parameters were received from RVGetSelection or RVGetLinearCaretPos.

The only text exporting function which returns string having one-to-one correspondence to places in the document is GetTextRange from RVLinear.pas. It treats all non-text items (except for tables) as one space character. Other functions may ignore non-text items, or return multicharacter text representations of them.

Posted: Wed Nov 23, 2005 12:54 pm
by martindholmes
Unfortunately, GetTextRange returns a string, not a widestring, so I can't use it.

Do you think there's any other way?

Cheers,
Martin