Boost_tokenizer

マルチバイト文字列を指定のセパレーターで区切る時，その区切り文字（当然マルチバイト）を残したい．wcstok や strtok は区切り文字を吸収してしまう．そこで，久しぶり Boost に手を出した．

#include        <iostream>
#include        <string.h>
#include <stdio.h>
#include <wchar.h> 
#include        <boost/tokenizer.hpp>
      using namespace std;
      using namespace boost;

int  main(){

 setlocale(LC_ALL, "");

 typedef 
       tokenizer<char_separator<wchar_t>, 
       wstring::const_iterator,
       wstring> wtokenizer;

 wstring ss = L"もも！名詞！果物"; 
 char_separator<wchar_t> 
           sep(L"！", L"！", keep_empty_tokens);
 wtokenizer wtok(ss,sep);
 int i=0;
 char str[10];
 
 for(wtokenizer::iterator it =wtok.begin(); 
                          it !=wtok.end(); ++it){
   i = wcstombs(str , it->c_str(), 10 );
   str[strlen(str)] = '\0'; //必要か？
   cout << i << " : " << str     << "\n";

 }
 return 0;

}

$ g++ wtokenizer.cpp
$  ./a.out 
6 : もも
3 : ！
6 : 名詞
3 : ！
6 : 果物
$

使えそう．

ここやここが，さらにここが参考になった．別の意味では，ここも参考になった．

mingwの場合、パスを通さないのであれば boostの解凍フォルダ内の　boost　フォルダを丸ごと、mingwの　include　フォルダに放り込んでやる。boost解凍フォルダの　libs　を移動させる。

添付ファイル:

wstring.cpp 912件 [詳細]

Link: Programming(5433d) 日録2009年6月(5858d) Programming_C(5884d)

Last-modified: 2009-06-12 (金) 09:59:42 (5880d)

アールメカブ

Boost_tokenizer

最新の20件