Unicode

Unicode — Unicode utility functions.

Synopsis

typedef             librdf_unichar;
int                 librdf_unicode_char_to_utf8         (librdf_unichar c,
                                                         unsigned char *output,
                                                         int length);
int                 librdf_utf8_to_unicode_char         (librdf_unichar *output,
                                                         const unsigned char *input,
                                                         int length);
unsigned char *     librdf_utf8_to_latin1               (const unsigned char *input,
                                                         int length,
                                                         int *output_length);
unsigned char *     librdf_utf8_to_latin1_2             (const unsigned char *input,
                                                         size_t length,
                                                         unsigned char discard,
                                                         size_t *output_length);
unsigned char *     librdf_latin1_to_utf8               (const unsigned char *input,
                                                         int length,
                                                         int *output_length);
unsigned char *     librdf_latin1_to_utf8_2             (const unsigned char *input,
                                                         size_t length,
                                                         size_t *output_length);
void                librdf_utf8_print                   (const unsigned char *input,
                                                         int length,
                                                         FILE *stream);

Description

Utility functions to convert between UTF-8, full Unicode and Latin-1. Redland uses UTF-8 for all string formats (except where noted) but these may need to be converted to other Unicode encodings or downgraded with loss to Latin-1.

Details

librdf_unichar

typedef raptor_unichar librdf_unichar;

Unicode codepoint.


librdf_unicode_char_to_utf8 ()

int                 librdf_unicode_char_to_utf8         (librdf_unichar c,
                                                         unsigned char *output,
                                                         int length);

Convert a Unicode character to UTF-8 encoding.

deprecated: Use raptor_unicode_utf8_string_put_char() noting that the length argument is a size_t.

If buffer is NULL, then will calculate the length rather than perform it. This can be used by the caller to allocate space and then re-call this function with the new buffer.

c :

Unicode character

output :

UTF-8 string buffer or NULL

length :

buffer size (will be truncated to size_t)

Returns :

bytes written to output buffer or <0 on failure

librdf_utf8_to_unicode_char ()

int                 librdf_utf8_to_unicode_char         (librdf_unichar *output,
                                                         const unsigned char *input,
                                                         int length);

Convert an UTF-8 encoded buffer to a Unicode character.

deprecated: Use raptor_unicode_utf8_string_get_char() noting that the arg order has changed to input, length (a size_t), output.

If output is NULL, then will calculate the number of bytes that will be used from the input buffer and not perform the conversion.

output :

Pointer to the Unicode character or NULL

input :

UTF-8 string buffer

length :

buffer size (will be truncated to size_t)

Returns :

bytes used from input buffer or <0 on failure

librdf_utf8_to_latin1 ()

unsigned char *     librdf_utf8_to_latin1               (const unsigned char *input,
                                                         int length,
                                                         int *output_length);

Convert a UTF-8 string to ISO Latin-1.

Converts the given UTF-8 string to the ISO Latin-1 subset of Unicode (characters 0x00-0xff), discarding any out of range characters.

deprecated for librdf_utf8_to_latin1_2() that takes and returns size_t sizes and allows replacing of out of range characters.

If output_length is not NULL, the returned string length will be stored there.

input :

UTF-8 string buffer

length :

buffer size (will be truncated to size_t)

output_length :

Pointer to variable to store resulting string length or NULL

Returns :

pointer to new ISO Latin-1 string or NULL on failure

librdf_utf8_to_latin1_2 ()

unsigned char *     librdf_utf8_to_latin1_2             (const unsigned char *input,
                                                         size_t length,
                                                         unsigned char discard,
                                                         size_t *output_length);

Convert a UTF-8 string to ISO Latin-1.

Converts the given UTF-8 string to the ISO Latin-1 subset of Unicode (characters 0x00-0xff). Out of range characters are replaced with discard unless it is NUL (\0).

If output_length is not NULL, the returned string length will be stored there.

input :

UTF-8 string buffer

length :

buffer size

discard :

character to use to replace out of range characters or NUL (\0) to discard

output_length :

Pointer to variable to store resulting string length or NULL

Returns :

pointer to new ISO Latin-1 string or NULL on failure

librdf_latin1_to_utf8 ()

unsigned char *     librdf_latin1_to_utf8               (const unsigned char *input,
                                                         int length,
                                                         int *output_length);

Convert an ISO Latin-1 encoded string to UTF-8.

Converts the given ISO Latin-1 string to an UTF-8 encoded string representing the same content. This is lossless.

deprecated for librdf_latin1_to_utf8_2() that takes and returns size_t sizes.

If output_length is not NULL, the returned string length will be stored there.

input :

ISO Latin-1 string buffer

length :

buffer size (will be truncated to size_t)

output_length :

Pointer to variable to store resulting string length or NULL

Returns :

pointer to new UTF-8 string or NULL on failure

librdf_latin1_to_utf8_2 ()

unsigned char *     librdf_latin1_to_utf8_2             (const unsigned char *input,
                                                         size_t length,
                                                         size_t *output_length);

Convert an ISO Latin-1 encoded string to UTF-8.

Converts the given ISO Latin-1 string to an UTF-8 encoded string representing the same content. This is lossless.

If output_length is not NULL, the returned string length will be stored there.

input :

ISO Latin-1 string buffer

length :

buffer size

output_length :

Pointer to variable to store resulting string length or NULL

Returns :

pointer to new UTF-8 string or NULL on failure

librdf_utf8_print ()

void                librdf_utf8_print                   (const unsigned char *input,
                                                         int length,
                                                         FILE *stream);

Print a UTF-8 string to a stream.

Pretty prints the UTF-8 string in a pseudo-C character format like \uhex digits when the characters fail the isprint() test.

input :

UTF-8 string buffer

length :

buffer size (will be truncated to size_t)

stream :

FILE* stream