qileilove

blog已经转移至github,大家请访问 http://qaseven.github.io/

面向系统测试的一种ganglia指标扩展的方法

 ganlia 和 nagios 等工具,是业界优秀的监控告警工具;这种工具主要是面向运维的,也可以用来进行性能稳定性的测试
  面对分布式系统测试,耗时都比较长,往往一台机器安装多套系统,影响监控指标的准确性。
  下面是一种进行进程级别监控的方n法,可以通过扩展,集群的监控力度;同时将监控脚本加入告警,防止脚本异常退出(Nagios扩展另文描述)
  GEngin.py:总体的引擎,根据conf下配置文件的配置项,轮询监控指标,调用gmetric广播出去
  conf:目录中保存metrix配置文件,配置参数指标
  flag:目录中仅保存一个flag文件,文件名就是任务名,监控指标将根据任务名分离,便于汇总统计对比
  log: 目录中记录GEngin的log及每个指标收取脚本的log
  pid: GEngin的pid 为告警脚本使用
  script: 指标收集的具体的脚本
  cat conf/metrix.cfg:
YARN|ResourceManager|cpu|ResourceManager_cpu.py|ResourceManager_cpu.txt|int16|Percent|
YARN|ResourceManager|mem|ResourceManager_mem.py|ResourceManager_mem.txt|int16|Percent|
YARN|ResourceManager|lsof|ResourceManager_lsof.py|ResourceManager_lsof.txt|int16|Number|
  ls flag/:
  yarntestD001.flag
  ll log/:
-rw-r--r-- 1 yarn users     168 Mar 19 20:02 yarntestD001_YARNResourceManagercputdw-10-16-19-91.txt
-rw-r--r-- 1 yarn users     168 Mar 19 20:02 yarntestD001_YARNResourceManagerlsoftdw-10-16-19-91.txt
-rw-r--r-- 1 yarn users     168 Mar 19 20:02 yarntestD001_YARNResourceManagermemtdw-10-16-19-91.txt
  ll script/:
-rw-r--r-- 1 yarn users  882 Feb 28 17:20 ResourceManager_cpu.py
-rw-r--r-- 1 yarn users 1093 Feb 28 17:45 ResourceManager_lsof.py
-rw-r--r-- 1 yarn users  882 Feb 28 17:18 ResourceManager_mem.py
  cat script/SAMPLE.py:
#!/usr/bin/env python
# coding=gbk
import sys
import os
import datetime
import time
def CheckInput():
"Check Input parameters , they should be a pysql file."
if len(sys.argv) < 2 :
print "Usage: " + sys.argv[0] + " FileNamePrefix "
sys.exit()
if __name__== '__main__':
CheckInput() # check parameter and asign PyFileName
## result file log to directory of LOG
LogFile = open("log/"+sys.argv[1],'a')
res = "29"
## Interface to Gmetrix ,must be value:Value
print "value:"+res
ntime = str(time.strftime("%Y-%m-%d %X",time.localtime()))
LogFile.write(ntime+" "+res+"\n")
LogFile.close()

  cat GEngin.py :
#!/usr/bin/env python
# coding=gbk
import sys
import os
import random
import datetime
import time
from time import sleep
def CheckInput():
"Check Input parameters , they should be a pysql file."
print "Usage : python ./" + sys.argv[0]
if not os.path.exists("conf/metrix.cfg"):
print "Error : config file conf/metrix.cfg does not exsits ! "
sys.exit()
## kill previous proc For restart
if os.path.exists("pid/pid.txt"):
pfile = open("pid/pid.txt",'r')
for p in pfile:
pid = p.strip()
os.system("kill -9 "+pid)
pfile.close()
os.system("rm pid/pid.txt")
pfile = open("pid/pid.txt",'a')
pid = os.getpid()
pfile.write(str(pid))
pfile.close()
if __name__== '__main__':
CheckInput() # check parameter and asign PyFileName
LogFile = open("log/"+sys.argv[0]+".log",'a')
# File Prefix of logs
filePre="noTask"
for fi in os.listdir("flag"):
if fi.endswith(".flag"):
filePre=fi.split('.')[0].strip()
# host name for gmetrix
host=""
f = os.popen("hostname")
for res in f:
if res.startswith("tdw"):
host=res.strip()
LogFile.write("******** Start task "+filePre+" monitoring *******\n")
# Main Loop untile flag is null
while True:
if len(os.listdir("flag")) < 1 or len(os.listdir("flag")) > 1:
sleep(10)
LogFile.write("Finish previous take "+filePre+"  .... No task ,Main loop .....\n")
LogFile.flush()
continue
if len(os.listdir("flag")) == 1 and not os.path.exists("flag/"+filePre+".flag"):
LogFile.write("Finish previous take "+filePre+".....\n")
for fi in os.listdir("flag"):
if fi.endswith(".flag"):
filePre=fi.split('.')[0].strip()
LogFile.write("***** Start New Task "+filePre+" monitoring *******\n")
# Deal with config metrix one by one
insFile = open("conf/metrix.cfg",'r')
for line in insFile:
mGroup,mName,mItem,mShell,mFile,mUnit,mWeiht,nouse = line.split('|');
outPutFile = filePre+"_"+mGroup+mName+mItem+host+".txt"
value = ""
if mShell.endswith(".py"):
f = os.popen("python script/"+mShell+" "+outPutFile)
for res in f:
if res.startswith("value:"):
value=res.split(':')[1].strip()
else:
value="0"
f.close()
if mShell.endswith(".sh"):
f = os.popen("script/"+mShell+" "+outPutFile)
for res in f:
if res.startswith("value:"):
value=res.split(':')[1].strip()
else:
value="0"
f.close()
cmd = "gmetric -n "+mGroup+"_"+mName+"_"+mItem+" -v "+value+" -t "+mUnit+" -u "+mWeiht+" -S "+host+":"+host
print cmd
f = os.popen(cmd)
ntime = str(time.strftime("%Y-%m-%d %X",time.localtime()))
LogFile.write(ntime+" "+cmd+"\n")
insFile.close()
LogFile.flush()
if len(os.listdir("flag")) == 1 and os.path.exists("flag/"+filePre+".flag"):
sleep(8)
LogFile.close()
  Ganglia 中显示的监控指标:
  将运行的GEngin.py脚本加入监控,防止进程异常退出

posted on 2014-03-27 16:53 顺其自然EVO 阅读(290) 评论(0)  编辑  收藏 所属分类: 测试学习专栏


只有注册用户登录后才能发表评论。


网站导航:
 
<2014年3月>
2324252627281
2345678
9101112131415
16171819202122
23242526272829
303112345

导航

统计

常用链接

留言簿(55)

随笔分类

随笔档案

文章分类

文章档案

搜索

最新评论

阅读排行榜

评论排行榜