<< Click to display table of contents >> Unicode in TRichView |
Unicode is a worldwide character-encoding standard. Unicode simplifies localization of software and improves multilingual text processing. By implementing it in an application, a developer can enable the application with universal data exchange capabilities for global marketing, using a single binary file for every possible character code.
For Delphi 2009 or newer, Unicode is a default encoding for strings.
All strings in TRichView are Unicode strings.
Text Files
LoadText, LoadTextFromStream load ANSI text files. A code page for conversion to Unicode is specified in the optional parameter.
LoadTextW, LoadTextFromStreamW load Unicode text files.
Note: you can test file with the function
function RV_TestFileUnicode(const FileName: TRVUnicodeString): TRVUnicodeTestResult
defined in RVUni.pas.
Return values
▪rvutNo – the file is not Unicode (odd size);
▪rvutYes – the file is most likely Unicode (UTF-16) (even size, Unicode byte-order characters at the start or #0 in text (first 500 bytes checked));
▪rvutProbably – the file can contain Unicode (even size);
▪rvutEmpty – the file is empty;
▪rvutError – error opening the file.
You can also use WinAPI function IsTextUnicode performing more advanced tests.
SaveText saves ANSI text file. Unicode strings are converted basing on Style.DefCodePage property.
SaveTextW saves Unicode text file. ANSI strings are converted basing on the corresponding Charsets.
RTF (Rich Text Format) and DocX files
RTF and DocX files can contain Unicode text.
HTML
SaveHTML*** can save ANSI or Unicode (UTF-8) HTML files. In ANSI HTML files, Unicode characters are written as codes (&#NNNN;), so all Unicode characters are preserved, but file size is increased; so it's highly recommended to save HTML in UTF-8 encoding.
GetSelTextA returns selection as an ANSI string. Unicode text is converted basing on Style.DefCodePage property.
GetSelTextW returns selection as a Unicode string.
Text searching methods have versions allowing to search for ANSI and for Unicode string: TRichView.SearchTextA/SearchTextW; however, SearchTextA simply converts the string to Unicode (using Style.DefCodePage) and calls SearchTextW.
CopyTextA copies selection as ANSI text. Unicode strings are converted basing on Style.DefCodePage property.
CopyTextW copies selection as Unicode.
Copy and CopyDef are copy Unicode (Option-rvoAutoCopyUnicodeText)
If pasting text using Paste method, and text is available in Clipboard, the method pastes Unicode text.
PasteTextA pastes ANSI text, PasteTextW pastes Unicode text.
InsertTextFromFile: the file must be ANSI (converted, if needed)
InsertOEMTextFromFile: the file must be OEM (converted, if needed)
InsertTextFromFileW: the file must be Unicode (converted, if needed)
InsertText, InsertStringTag add Unicode string in Delphi/C++Builder 2009+ and ANSI string in older versions of Delphi/C++Builder.
InsertTextA, InsertStringATag add ANSI string (converted, if needed)
InsertTextW, InsertStringWTag add Unicode string (converted, if needed)
Applications compiled with older versions of TRichView (version less than 1.2) will not be able to load RVF files with Unicode.
RVF files will be loaded correctly even if Unicode flags in text styles are mismatched (saved with different RVStyle then loaded), conversions will be performed if required (for example, this conversion will occur when loading old RVF files in applications compiled in Delphi/C++Builder 2009+). There are two RVF Warnings: rvfwConvToUnicode and rvfwConvFromUnicode, which indicate if any conversion took place.
TRichView v11 introduces a new change in RVF files allowing to store String properties as Unicode. RVF files saved in Delphi/C++Builder 2009+ are saved as RVF version 1.3.1, RVF files saved in the older versions of Delphi/C++Builder are saved as RVF version 1.3.