前不久使用Sitemesh遇到乱码问题,偶然了解到pageEncoding这一属性,并用google搜到一些资料,在此记录之。
SCWCD Exam Study Kit 对
contentType 的敘述
The
contentType attribute specifies the MIME type and character encoding of the
output. The default value of the MIME type is text/html; the default value of the
character encoding is ISO-8859-1. The MIME type and character encoding are
separated by a semicolon, as shown here:
<%@ page contentType=
"text/html;charset=ISO-8859-1" %><%
@ page contentType="text/html;charset=ISO-8859-1" %>This is equivalent to writing the following line in a servlet:
response.setContentType("text/html;charset=ISO-8859-1"); 这样是否说明,如果使用SetCharacterEncodingFilter来过滤所有的request的话,就不需要在每个JSP页面加上<% @page>指令了?<%@ page contentType>指令了?
对pageEncoding
The pageEncoding attribute specifies the character encoding of the JSP page. The
default value is ISO-8859-1. The following line illustrates the syntax:
<%@ page pageEncoding="ISO-8859-1" %>
在JSP 2.0 Spec 的JSP.4.1 对pageEncoding的叙述
Describes the character encoding for the JSP page.
For JSP pages in standard syntax, the page character encoding is determined
from the following sources:
• A JSP configuration element page-encoding value whose URL pattern matches
the page.
• The pageEncoding attribute of the page directive of the page. It is a translation-
time error to name different encodings in the pageEncoding attribute of
the page directive of a JSP page and in a JSP configuration element whose
URL pattern matches the page.
• The charset value of the contentType attribute of the page directive. This is
used to determine the page character encoding if neither a JSP configuration
element page-encoding nor the pageEncoding attribute are provided.
• If none of the above is provided, ISO-8859-1 is used as the default character
encoding.
the character encoding for the JSP page是什么意思呢?
这样说就清楚了:pageEncoding是当jsp转译成_jsp.java时使用的encoding。
要了解JSP的乱码问题,最重要的是了解jsp的编译输出流程。
1. 从JSP“翻译”成*_jsp.java,此时JSPC根据pageEncoding来读取JSP(注意是读取),然后把它翻译成统一的utf-8 JAVA源码(.java).。如果pageEncoding设定错了,此时出来的中文已经是乱码了。
2. 从Java源码编译成Java ByteCode,此时JavaC将utf-8编码的Java源码编译成同样utf-8的二进制码(.class).
3. Tomcat或者其他应用服务器载入并执行Java ByteCode,并使用contentType设定的的字符集来输出结果(html页面)。
了解了以上流程,应该对JSP页面的乱码问题有本质的理解了。
以上文字参考:
1. JavaWorld@TW:page指令:contentType VS. pageEncoding
2. Matrix:jsp,db,apache中文乱码的解决方案