Perlでのutf-8による日本語処理

アールメカブ

Perlでのutf-8による日本語処理

例えば日本語で次のようなスクリプトを用意しておくと

#!/usr/bin/perl

use utf8;
binmode(STDIN, ":utf8");
binmode(STDOUT, ":utf8");
use open ':utf8';

while(<>){
#	/(\w)/;
	/(\p{Han})/;
	print "$1\n";
}
# test.txt
これは試行です．

これで ./utf.pl < test.txt とすると，ちゃんと「試」を補足する．

あるいは，euc-jp で書いたテキストを読み込んで処理するには

#!/usr/bin/perl

use utf8;
binmode(STDIN, ":utf8");
binmode(STDOUT, ":utf8");
open(IN,"<:encoding(euc-jp)", $ARGV[0]);

while(<IN>){
	@char = split//;
	foreach (@char){
		print;
		print "\n";
	}
}

以下が参考になる．

http://module.jp/blog/regex_unicode_prop.html

Link: Programming(5433d)

Last-modified: 2007-11-02 (金) 18:24:56 (6468d)

アールメカブ

Perlでのutf-8による日本語処理

最新の20件