Snowdream

posts - 403, comments - 310, trackbacks - 0, articles - 7

语源科技BlogJava :: 首页 :: 新随笔 :: 联系 :: 聚合

:: 管理

燕曦版友信息统计脚本

Posted on 2007-09-10 12:27 ZelluX 阅读(443) 评论(2) 编辑收藏所属分类: Scripting

connector.py

import urllib, urllib2, cookielib

class MyConnector:

def __init__(self):

pass

def login(self, url):

cookie = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))

urllib2.install_opener(opener)

str = urllib.urlencode({'id': 'guest', 'passwd': ''})

self.sock = urllib2.urlopen(url, str)

def getHTML(self, url):

self.sock = urllib2.urlopen(url)

return self.sock.read()

yanxiparser.py

from sgmllib import SGMLParser

import re

class YanxiURLParser(SGMLParser):

def reset(self):

self.result = []

SGMLParser.reset(self)

def start_a(self, attrs):

for (k, v) in attrs:

if (k == 'href' and (v.find('bbsanc') >= 0)):

self.result.append(v)

class YanxiHTMLParser:

def parse(self, html):

uid = ufrom = ubirth = ufav = ''

html = html.replace(r' ', ' ')

html = html.replace(r'<br />', '')

pattern = '\xbe\xcd\xca\xc7(.*)\xc0\xb2'

matchObject = re.search(pattern, html)

uid = matchObject.group(1)

uid = uid.strip()

pattern = '\xc0\xb4\xd7\xd4(.*)\xa3(\xac|xa1)'

matchObject = re.search(pattern, html)

ufrom = matchObject.group(1)

ufrom = ufrom.strip()

pattern = '\xcf\xb2\xbb\xb6(.*)\n'

matchObject = re.search(pattern, html)

ufav = matchObject.group(1)

ufav = ufav.strip()

pattern = '\n(.*)\xca\xc7\xce\xd2\xb5\xc4\xc9\xfa\xc8\xd5'

matchObject = re.search(pattern, html)

ubirth = matchObject.group(1)

ubirth = ubirth.strip()

return {"id" : uid, "from" : ufrom, "birth" : ubirth, "fav" : ufav}

runner.py

from connector import MyConnector

from yanxiparser import *

rootURL = 'http://yanxibbs.cn'

loginURL = 'http://yanxibbs.cn/bbslogin.php'

url1 = 'http://yanxibbs.cn/cgi-bin/bbs/bbs0an?path=%2Fgroups%2FGROUP%5F3%2F06SS%2Fbyxx%2Fbjcy'

url2 = 'http://yanxibbs.cn/cgi-bin/bbs/bbs0an?path=%2Fgroups%2FGROUP%5F3%2F06SS%2Fbyxx%2Fbjyr'

conn = MyConnector()

conn.login(loginURL)

def printInfo(url):

html = conn.getHTML(url)

urlParser = YanxiURLParser()

htmlParser = YanxiHTMLParser()

urlParser.feed(html)

for targetURL in urlParser.result:

html = conn.getHTML(rootURL + targetURL)

info = htmlParser.parse(html)

print "%(id)s\t%(from)s\t%(birth)s\t%(fav)s" % info

printInfo(url1)

printInfo(url2)

# re: 燕曦版友信息统计脚本回复 更多评论

2009-01-13 02:16 by SmartQ

天哪你是。。。

# re: 燕曦版友信息统计脚本 回复 更多评论

2009-01-13 09:48 by ZelluX

@SmartQ
ZelluX@yanxi

新用户注册刷新评论列表


只有注册用户登录后才能发表评论。




网站导航: 博客园博客园最新博文博问管理
相关文章: 将VIM Calender中的日记发布到blogger.com的脚本 [zz]True closure in Python 在Python的for循环中计数 Python中Dictionary类型的排序【转载】Python 中的函数式编程 (1) texttable - module for creating simple ASCII tables 燕曦版友信息统计脚本 Python 学习笔记 - XML 解析 Python 学习 - File and Directory Python 学习笔记 (5)

Snowdream

燕曦版友信息统计脚本

评论

# re: 燕曦版友信息统计脚本回复 更多评论

# re: 燕曦版友信息统计脚本 回复 更多评论

日历

常用链接

留言簿(21)

随笔分类(390)

随笔档案(389)

文章档案(7)

相册

15ers

友情链接

收藏夹

搜索

积分与排名

最新随笔

最新评论

阅读排行榜

评论排行榜

Snowdream

燕曦版友信息统计脚本

评论

# re: 燕曦版友信息统计脚本 回复 更多评论

# re: 燕曦版友信息统计脚本 回复 更多评论

日历

常用链接

留言簿(21)

随笔分类(390)

随笔档案(389)

文章档案(7)

相册

15ers

友情链接

收藏夹

搜索

积分与排名

最新随笔

最新评论

阅读排行榜

评论排行榜

# re: 燕曦版友信息统计脚本回复更多评论

# re: 燕曦版友信息统计脚本回复更多评论