网上有很多关于java连接http server 的文章,也有不少在http server 上通过用户认证后从后台获取网页的帖子。但是,很少有一个对于https server上通过用户验证的介绍。项目需要,我折腾了一整天,终于搞定了,现在来整理一下。概括的说, https 比之 http 只是多了一个加密解密过程,所以https的连接只是比http连接多了一个验证的过程,一旦验证通过,剩下的操作与http上的相同。也就是说,在https server上一旦certificate通过验证,剩下的用户验证就于http server上用户认证一致,概括起来,这整一个过程如下:
1. 建立第一个HttpsURLConnection(URL为登录页面url), 通过https上server certificate 与client的验证
2. 用POST方式向登录页面传出userID 和 password (具体的变量名要参考单
中的名称)。 post操作成功后,取得上面connection的Cookie,通过cookie split出SessionID。
3. 建立第二个HttpsURLConnection(URL为要抓取页面的url), 通过https上的证书验证
4. 用 URLConnetion.setPropertyValue("Cookie", SessionID),设置第二个URL的cookie,
确保两个connection属于同一个登入后的Session
5. connection.getInputStream获得目标页面的内容
下面是我用到的一段testing code,是几个独立的片段,已通过测试,有兴趣的朋友改改后就可以用:
try
{
if(protocol.equals("http")){
final HttpURLConnection connection = (HttpURLConnection)iSourceURL.openConnection();
connection.connect();
stream = connection.getInputStream();
//
// try{
// printIoStream(stream);
// }catch(Exception e){
// e.printStackTrace();
// }
modelSource = new StreamSource(stream);
// connection.disconnect();
}
else if(protocol.equals("https")){
try {
SSLContext sc = SSLContext.getInstance("SSL");
sc.init(null, new TrustManager[] { new iTrustManager() },
new java.security.SecureRandom());
// url = new URL("https://9.186.10.56:8443/LogonServlet");
URL url = new URL(iSourceURL.getProtocol() + "://" + iSourceURL.getHost() + ":" + iSourceURL.getPort() + "/LogonServlet");
String strPost = "intranetID=*****&password=******";
HttpsURLConnection conn = (HttpsURLConnection) url.openConnection();
conn.setSSLSocketFactory(sc.getSocketFactory());
conn.setHostnameVerifier(new TrustAnyHostnameVerifier());
addProperty(conn);
conn.setFollowRedirects(true);
conn.setInstanceFollowRedirects(true);
conn.setDoOutput(true); // IO input to Server
conn.setDoInput(true); //
conn.setUseCaches(false); // obtain the newest info of server
conn.setAllowUserInteraction(false);
conn.setRequestMethod("POST");
conn.getOutputStream().write(strPost.getBytes());
conn.getOutputStream().flush();
conn.connect();
String cookie = conn.getHeaderField("Set-Cookie");
String SessionID = getSessionIdFromCookie(cookie);
stream = conn.getInputStream();
conn.disconnect();
// printIoStream(stream);
final HttpsURLConnection connection = (HttpsURLConnection)iSourceURL.openConnection();
connection.setSSLSocketFactory(sc.getSocketFactory());
connection.setHostnameVerifier(new TrustAnyHostnameVerifier());
connection.setRequestProperty("Cookie", SessionID);
connection.connect();
stream = connection.getInputStream();
modelSource = new StreamSource(stream);
// printIoStream(stream);
} catch (Exception e) {
TMCodePlugin.getInstance().writeToLog(
Status.ERROR,"Could not read data via URL(https):"+ iSourceURL, null);
e.printStackTrace();
}
}else{
TMCodePlugin.getInstance().writeToLog(Status.ERROR, "Protocol illegal: "+iSourceURL, null);
}
}
catch(IOException e)
{
TMCodePlugin.getInstance().writeToLog(Status.ERROR, "Could not read data via URL:"+iSourceURL, null);
}
catch(IllegalArgumentException e)
{
TMCodePlugin.getInstance().writeToLog(Status.ERROR, "Could not read data via URL - illegal argument in URL:"+iSourceURL, null);
}
}
/** *//**
* the protocal of SSL operation on java, visite the HTTPS server via socket
* @author chaixzh
*/
class iTrustManager implements X509TrustManager {
iTrustManager() {
}
// check client trust status
public void checkClientTrusted(X509Certificate chain[], String authType)
throws CertificateException {
System.out.println("check client trust status");
}
// check Server trust status
public void checkServerTrusted(X509Certificate chain[], String authType)
throws CertificateException {
System.out.println("check Server trust status");
}
//get those accepted Issuers
public X509Certificate[] getAcceptedIssuers() {
return null;
}
}
private static class TrustAnyHostnameVerifier implements HostnameVerifier {
public boolean verify(String hostname, SSLSession session) {
return true;
}
}
/** *//**
* to split out the SessionID from a Cookie String
* @param cookie
* @return
*/
private String getSessionIdFromCookie(String cookie){
int index_1 = cookie.indexOf("JSESSIONID=");
int index_2 = cookie.indexOf(";");
return cookie.substring(index_1, index_2);
}
/** *//**
* just for the sake of debuging
* @param stream
* @throws Exception
*/
private void printIoStream(InputStream stream) throws Exception{
BufferedInputStream buff = new BufferedInputStream(stream);
Reader r = new InputStreamReader(buff, "gbk");
BufferedReader br = new BufferedReader(r);
StringBuffer strHtml = new StringBuffer("");
String strLine = null;
while ((strLine = br.readLine()) != null) {
strHtml.append(strLine + "\r\n");
}
System.out.print(strHtml.toString());
}
private void addProperty(URLConnection connection){
connection.addRequestProperty("Accept", "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/msword, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/x-silverlight, */*");
connection.setRequestProperty("Referer", "https://9.186.10.56:8443/index.jsp");
connection.setRequestProperty("Accept-Language", "zh-cn");
connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
connection.setRequestProperty("Accept-Encoding", "gzip, deflate");
connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Foxy/1; .NET CLR 2.0.50727;MEGAUPLOAD 1.0)");
connection.setRequestProperty("Connection", "Keep-Alive");
connection.setRequestProperty("Cache-Control", "no-cache");
}
此外,还有通过socket连接, 或者借助apache 的httpclient连接的,不一而足,大同小异,总体就是通过验证后保持在同一个session中,进而抓取目标网页内容。
cxzforever