1 |
Currently, dev-libs/link-grammar fails its test suite on uclibc. In |
2 |
tests/test-suite.log, the text "link-grammar: Error: Affix dictionary: |
3 |
QUOTES: Invalid utf8 character" is found. By looking up the "Invalid |
4 |
utf8 character" message in the link-grammar source code, I found out |
5 |
it's the call to mbsrtowcs that fails. I tried to check in uClibc |
6 |
sources how that function can be configured, and from the documents |
7 |
inside uClibc sources, I learned ctype and wchar support is a mess. |
8 |
|
9 |
Because link-grammar loads from UTF-8, and that UTF-8 can be |
10 |
translated to wide character strings using bit masks and bit shifts |
11 |
(no big fat table needed), I made up my own implementation of |
12 |
mbsrtowcs for UTF-8, reading the manual pages for mbsrtowcs and |
13 |
mbrtowc and the Wikipedia article on UTF-8. |
14 |
|
15 |
But before integrating in link-grammar or somewhere else, I would like |
16 |
a code review on it. The attached source code is MIT-licensed, so I |
17 |
can put in any open source project I want without worrying about the |
18 |
license issues, so do you. |
19 |
|
20 |
-- |
21 |
René Rhéaume |