I am using Mojo::DOM Perl module to replace <IMG>
tag, but the
entity is replace by Mojo::DOM with \xa0, but when I print it out to the page the NBSP character becomes \x{fffd} and shows up as a question mark. I have tried replace \x{00a0} with but doing that corrupts another unicode character. Here's my code:
#!/usr/bin/perl
use utf8;
use strict;
use warnings;
use CGI;
my $cgi = new CGI;
print $cgi->header(-charset => 'utf-8');
my %params = $cgi->Vars;
print q[<html><head><title>UTF-8 Test</title></head><body><form method="POST"><textarea name="msg" cols="50" rows="20">].$params{msg}.q[</textarea><br/><br/><input type="submit"></form>];
if($ENV{REQUEST_METHOD} eq 'POST') {
require Mojo::DOM;
my $dom = Mojo::DOM->new($params{msg});
for my $e ($dom->find('img')->each) {
my $x = $e->attr('data-char');
if(defined($x) && $x) {
$e->replace($x);
}
else {
$e->delete;
}
}
$params{msg} = $dom->to_string();
print '<hr/><div>'.$params{msg}.'</div>';
}
print q[</body></html>];
Contents of msg param that is POSTed:
אֱלֹהִים,+אֵת+הַשָּׁמַיִם,+וְאֵת+הָאָרֶץ. 1 In the beginningpo <img src="p.jpg" data-char=""> Easy Bengali Typing: বাংলা টাইপ করুন Минюст РФ опубликовал список СМИ-иноагентов Japanese Keyboard - 日本語のキーボード Pre-Qin and Han (先秦兩漢)
Here's a screenshot of the output: