Gentoo Archives: gentoo-embedded

From: "René Rhéaume" <rene.rheaume@×××××.com>
To: gentoo-embedded@l.g.o
Subject: [gentoo-embedded] dev-libs/link-grammar and uclibc mbsrtowcs
Date: Sun, 26 Jun 2016 17:36:00
Message-Id: CAPLjCyK8LE5ECvZ+cJ_OV51SJyozc4cAcUguo30CREv63-nkPg@mail.gmail.com
1 Currently, dev-libs/link-grammar fails its test suite on uclibc. In
2 tests/test-suite.log, the text "link-grammar: Error: Affix dictionary:
3 QUOTES: Invalid utf8 character" is found. By looking up the "Invalid
4 utf8 character" message in the link-grammar source code, I found out
5 it's the call to mbsrtowcs that fails. I tried to check in uClibc
6 sources how that function can be configured, and from the documents
7 inside uClibc sources, I learned ctype and wchar support is a mess.
8
9 Because link-grammar loads from UTF-8, and that UTF-8 can be
10 translated to wide character strings using bit masks and bit shifts
11 (no big fat table needed), I made up my own implementation of
12 mbsrtowcs for UTF-8, reading the manual pages for mbsrtowcs and
13 mbrtowc and the Wikipedia article on UTF-8.
14
15 But before integrating in link-grammar or somewhere else, I would like
16 a code review on it. The attached source code is MIT-licensed, so I
17 can put in any open source project I want without worrying about the
18 license issues, so do you.
19
20 --
21 René Rhéaume

Attachments

File name MIME type
mbsrtowcs.c text/x-csrc

Replies

Subject Author
Re: [gentoo-embedded] dev-libs/link-grammar and uclibc mbsrtowcs "Anthony G. Basile" <basile@××××××××××××××.edu>