I've created a sqlite database with encoding UTF-8(default).
Then I use the following statement to insert data:
strcpy(sql,"insert into blog(title) values('呵呵')"); sqlite3_exec(db,sql,0,0,0);
then I open the sqlite database with tool called SQLite Developer the value of
title field shows
ºǺ� garbage code under Data encoding:UNICODE. then I changed Data encoding to ANSI, value of
title shows right.
As I know the
sqlite3_exec prototype is :
int sqlite3_exec( sqlite3*, /* An open database */ const char *sql, /* SQL to be evaluated */ int (*callback)(void*,int,char**,char**), /* Callback function */ void *, /* 1st argument to callback */ char **errmsg /* Error msg written here */ );
I still try to pass
wchar_t type to
sql,but still won't work it out.
My Visual C++ project already defined
_UNICODE, So my question is: how to store UTF-8 encoding data to sqlite3 using Visual C++?
I use iconv to convert GBK encoding to UTF-8 inspired by msandiford. Thanks msandiford so much.
char* pOut; char* pIn; size_t inLen,outLen=2000; strcpy(sql,"insert into blog(title) values('呵呵')"); string strSQL = sql; char* sql2 = (char*)malloc(2000); memset(sql2,0,2000); pOut = &sql2; inLen = strlen(strSQL.c_str()); pIn = const_cast<char*>(strSQL.c_str()); iconv_t g2u8 = iconv_open("UTF-8","GBK"); iconv(g2u8,(const char**)&pIn,&inLen,&pOut,&outLen); sqlite3_exec(db,sql2,0,0,0);
Collecting comments into answer form:
From the question comments, apparently the source files are not encoded in UTF-8. Converting to UTF-8 or using the UTF-8 encoding directly seems to work.
Using UTF-8 encoding directly:
strcpy(sql,"insert into blog (title) values ('\xE5\x91\xB5\xE5\x91\xB5')");
You could avoid having to convert all your source files to UTF-8 by doing something like this:
sprintf(sql, "insert into blog (title) values('%s')", AnsiToUtf8("呵呵"));
AnsiToUtf8() function is going to be pretty platform specific.
Looking further into this, it appears that Visual Studio saves source files in the default encoding for your Windows locale settings. Based on this, there could potentially be an assortment of encodings if your dev team's computers are set up for different locales.
I think it would be quite difficult, if not impossible, to implement an
AnsiToUtf8() function that would cope in all the possible cases, especially given that the locale settings for the computer that the code is developed on may not be the same as the computer that ultimately runs the code.
I think the cleanest way to resolve this would be to use UTF-8 encoding uniformly in source files, assuming you want to use code points in string literals outside the areas where the default encoding and Unicode overlap.
Another way would be to internationalise the code so that the source files did not contain extended characters, and use something like GNU gettext or similar to handle translations.