paulwong

[Mac] MAC OSX快捷键大全

Cmd-C 复制文件

Cmd-V 粘贴文件
Option-拖动复制文件到新地址
Cmd-拖动移动并自动对齐
Cmd-Delete 删除
Cmd-Option-拖动做替身(快捷方式)
Cmd-Shift-Delete 清空垃圾桶
Cmd-Shift-Option-Delete 强制清空垃圾桶
Tab 选定下一个项目
Shift-Tab 选定上一个项目
Return 执行默认动作
Escape 关闭对话框
Page Up 向上翻页
向上箭头选取上一个文件
Page Down 向下翻页
向下箭头选取下一个文件
Cmd-Shift-G 打开’前往文件夹’对话框
Cmd-句号 [.] 关闭对话框
Exposé 和系统的快捷
F8 切换Space
Shift-F8 慢速切换Space
F9（默认设置）使用 Exposé 显示所有打开的窗口
F10（默认设置）使用 Exposé 在某个应用程序中显示所有打开的窗口
F11（默认设置）使用 Exposé 隐藏所有打开的窗口并显示桌面
Cmd-H 隐藏程序
Cmd-Option-H 隐藏其他程序
Cmd-Q 退出程序
Cmd-Shift-Q 退出所有程序并且注销用户
Cmd-Option-Shift-Q 强制注销用户
Cmd-Tab 切换到下一个应用程序
Cmd-Shift-Tab 切换到上一个应用程序
Cmd-拖动整理菜单栏
按下 Option 键并点按一个窗口切换窗口并隐藏当前窗口
按住 Option 并点按 Dock 图标切换到另一个应用程序并隐藏当前应用程序
按下 Control 键并点按该项查看某个项的快捷（上下文）菜单
将光标移到该词上面，然后按 Cmd-Control-D 使用 Dictionary 查看对字词在应用程序中的定义

停止响应
Cmd-句号 [.] 停止进程
Cmd-Option-Escape 打开’强制退出’

电源键关机
Cmd-Option-Shift-电源键强制关机或重新启动（在某些电脑上）
Cmd-Control-电源键强制重启

Finder
Cmd-点击标题查看当前窗口的路径
Cmd-双击 (文件夹上) 新窗口中打开文件夹
Option-双击 (文件夹上) 新窗口中打开文件夹并关闭当前窗口
Cmd-1 用图标浏览
Cmd-2 用列表浏览
Cmd-Option-向右箭头列表模式下显示包含的目录
向左箭头列表模式下关闭选定目录
Cmd-向下箭头在图标或列表模式下打开选定目录
Cmd-Option-向下箭头在图标或列表模式下在新窗口打开选定目录并关闭当前窗口
Cmd-Shift-Option-向下箭头 (慢速)在图标或列表模式下在新窗口打开选定目录并关闭当前窗口
Cmd-向上箭头打开上一级目录
Cmd-Option-向上箭头打开上一级目录并关闭当前目录
Cmd-3 用分栏浏览
Cmd-4 用cover flow浏览
Cmd-Y 打开快速查看
Cmd-Option-Y 用幻灯片显示
Cmd-Shift-H 打开用户文件夹
Cmd-Option-Shift-向上箭头聚焦桌面
Cmd-Shift-I 打开iDisk
Cmd-Shift-D 打开桌面
Cmd-Shift-C 打开’电脑’
Cmd-Shift-K 打开网络
Cmd-Shift-A 打开应用程序
双击标题最小化窗口
Cmd-M 最小化窗口
Option-点击按钮应用到所有激活的窗口
按下并按住滚动条快速浏览长文稿
按住 Option 键并点按滚动条迅速在“滚动到当前位置”和“滚动到页面”之间切换
Cmd-波浪符号 (~) 激活当前应用程序中的上一个或下一个窗口

Dock
拖动分割线自定义Dock大小
Option-拖动分割线调整Dock到合适大小
Control-点击显示Dock快捷菜单
Control-点击图标显示项目的快捷菜单
Cmd-点击打开图标所在文件夹
Option-点击切换并隐藏当前程序
Cmd-Option-点击切换并隐藏所有程序
Cmd-Option-拖动强制程序打开文件
Cmd-Option-D 显示/隐藏Dock

启动
*快捷键只能在启动时使用
当您看到进程指示器（看起来像旋转的齿轮）时，请按住左边的 Shift 键。防止自动登录
听到启动音之后立即按住 Shift 键，然后当您看到进程指示器（看起来像旋转的齿轮）时释放该键。以安全模式启动（只
有必要的 Mac OS X 项被启动，一些功能和应用程序可能无法正常工作。）
在登录屏幕上点按“登录”按钮之后，请按住 Shift 键。防止登录时打开“登录项”和 Finder 窗口
C 从光盘启动
N 从默认的 NetBoot 磁盘映像启动
T 以目标磁盘模式启动
Option 选择启动磁盘（在某些电脑上）
Cmd-X 使用 Mac OS X 而不是 Mac OS 9 来进行启动（如果两者均位于同一宗卷上）
按住鼠标键推出可去掉的光盘
Cmd-Option-P-R 还原参数 RAM
Cmd-V 显示详细的状态信息（详细模式）
Cmd-S 以单一用户模式启动

Safari
Cmd-Option-F google搜索栏
Option-向上箭头向上翻页
Option-向下箭头向下翻页
Cmd-点击链接在后台用新标签打开
Cmd-Shift-点击链接打开并激活新标签
Cmd-Option-点击链接打开新窗口
Option-点击 Close 按钮关闭其他标签
Cmd-Shift-] 选取下一个标签
Cmd-Shift-[ 选取上一个标签
Cmd-Shift-H 打开主页
Cmd-Shift-K 切换’禁止弹出窗口’
Cmd-Option-E 清空缓存
Cmd-Option-R 不用缓存并刷新页面
Cmd-F 查找
Cmd-M 最小化窗口
Shift-点击按钮慢动作动画效果
Cmd-加号[+] 增大字体
Cmd-减号[-] 减小字体
Cmd-0 默认字体

Dashboard
使用这些快捷来处理 Dashboard 和 Dashboard widget。
F12（默认设置）显示或隐藏 Dashboard
Cmd-R 重新载入当前 widget
Cmd-等号 (=) 显示或隐藏 widget 栏
Cmd-向左箭头键，Cmd-向右箭头键滚动 widget 栏
注:要更改 Dashboard 的快捷，请选取“文件”>“系统偏好设置”，点按“Exposé & Spaces”，然后点按“Exposé”。

Front Row
您可以使用键盘来控制 Front Row 而无需使用 Apple Remote 遥控器。
Cmd-Esc (Escape) 打开 Front Row
Cmd-Esc 或 Esc 从打开的菜单中关闭 Front Row
向上箭头键，向下箭头键浏览菜单和列表
Cmd-Esc 或 Esc 返回上一级菜单
空格键或 Return 选择菜单或列表中的项
空格键或 Return 播放和暂停音频或视频
向上箭头键，向下箭头键更改音量
向右箭头键，向左箭头键前往下一个或上一个歌曲或照片
向右箭头键，向左箭头键前往所播放 DVD 的下一章或上一章
右箭头键，左箭头键（按住按钮）快进或倒回歌曲、视频或 DVD
在某些 Apple 键盘和便携式电脑上，您或许也可以使用特定按键来更改音量和控制回放。

键盘导航
Control-F1 打开/关闭全键盘控制
Control-F2 聚焦菜单栏
Control-F3 聚焦Dock
Control-F4 聚焦活跃窗口或下一个窗口
Control-F5 聚焦窗口工具栏
Control-F6 聚焦浮动窗口
Control-F7 在控制或文本框与列表之间移动
Control-F8 聚焦菜单栏中的状态菜单
Cmd-Accent [`] 聚焦活跃应用程序的下一个窗口
Cmd-Shift-Accent [`] 聚焦活跃应用程序的上一个窗口
Cmd-Option-Accent [`] 聚焦窗口抽屉
Cmd-Option-T 显示或隐藏字符调板

posted @ 2012-10-07 19:48 paulwong 阅读(495) | 评论 (0) | 编辑收藏

Submitting a Hadoop MapReduce job to a remote JobTracker

Posted on August 31, 2012 by pcbje

While messing around with MapReduce code, I’ve found it to be a bit tedious having to generate the jarfile, copy it to the machine running the JobTracker, and then run the job every time the job has been altered. I should be able to run my jobs directly from my development environment, as illustrated in the figure below. This post explains how I’ve “solved” this problem. This may also help when integrating Hadoop with other applications. I do by no means claim that this is the proper way to do it, but it does the trick for me.

My Hadoop infrastructure

I assume that you have a (single-node) Hadoop 1.0.3 cluster properly installed on a dedicated or virtual machine. In this example, the JobTracker and HDFS resides on IP address 192.168.102.131.Let’s start out with a simple job that does nothing except to start up and terminate:

package com.pcbje.hadoopjobs;

import java.io.IOException;

import java.util.Date;

import java.util.Iterator;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reporter;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapred.Reducer;

public class MyFirstJob {

public static void main(String[] args) throws Exception {

Configuration config = new Configuration();

JobConf job = new JobConf(config);

job.setJarByClass(MyFirstJob.class);

job.setJobName("My first job");

FileInputFormat.setInputPaths(job, new Path(args[0));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(MyFirstJob.MyFirstMapper.class);

job.setReducerClass(MyFirstJob.MyFirstReducer.class);

JobClient.runJob(job);

}

private static class MyFirstMapper extends MapReduceBase implements Mapper {

public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {

}

private static class MyFirstReducer extends MapReduceBase implements Reducer {

public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {

}

Now, most of the examples you find online typically shows you a local mode setup where all the components of Hadoop (HDFS, JobTracker, etc) run on the same machine. A typical mapred-site.xml configuration might look like:

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

As far as I can tell, such a configuration requires that jobs are submitted from the same node as the JobTracker. This is what I want to avoid. The first thing to do is to change the fs.default.name attribute to the IP address of my NameNode.

Configuration conf = new Configuration();

conf.set("fs.default.name", "192.168.102.131:9000");

And in core-site.xml:

<name>fs.default.name</name>

</property>

</configuration>

This tells the job to connect to the HDFS residing on a different machine. Running the job with this configuration will read from and write to the remote HDFS correctly, but the JobTracker at 192.168.102.131:9001 will not notice it. This means that the admin panel at 192.168.102.131:50030 wont list the job either. So the next thing to do is to tell the job configuration to submit the job to the appropriate JobTracker like this:

config.set("mapred.job.tracker", "192.168.102.131:9001");

You also need to change mapred-site.xml to allow external connections, this can be done by replacing “localhost” with the JobTracker’s IP address:

<name>mapred.job.tracker</name>

</property>

</configuration>

Restart hadoop.Upon trying to run your job, you may get an exception like this:

SEVERE: PriviledgedActionException as:[user] cause:org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: Permission denied: user=[user], access=WRITE, inode="mapred":root:supergroup:rwxr-xr-x

If you do, this may be solved by adding the following mapred-site.xml:

<name>mapreduce.jobtracker.staging.root.dir</name>

</property>

</configuration>

And then execute the following commands:

stop-mapred.sh
start-mapred.sh

When you now submit your job, it should be picked up by the admin page over at :50030. However, it will most probably fail and the log will be telling you something like:

java.lang.ClassNotFoundException: com.pcbje.hadoopjobs.MyFirstJob$MyFirstMapper

In order to fix this, you have to ensure that all dependencies of the submitted job are available to the JobTracker. This can be achieved by exporting the project in as a runnable jar, and then execute something like:

java -jar myfirstjob-jar-with-dependencies.jar /input/path /output/path

If your user has the appropriate permissions to the input and out directory on HDFS, the job should now run successfully. This can be verified in the console and on the administration panel.

Manually exporting runnable jars requires a lot of clicks in IDEs such as Eclipse. If you are using Maven, you can tell it to build the jar with its dependencies (See this answer for details). This would make the process a whole lot easier.Finally, to make it even easier, place a tiny bash-script in the same folder as pom.xml for building the maven project and executing the jar:

#!/bin/sh
mvn assembly:assembly
java -jar $1 $2 $3

After making the script executable, you can build and submit the job with the following command:

./build-and-run-job target/myfirstjob-jar-with-dependencies.jar /input/path

posted @ 2012-10-03 15:06 paulwong 阅读(780) | 评论 (0) | 编辑收藏

HBASE的MAPREDUCE任务运行异常解决办法，无需CYGWIN，纯WINDOWS环境

如果是在WINDOWS的ECLIPSE中，运行HBASE的MAPREDUCE，会出现异常，这是由于默认运行MAPREDUCE任务是在本地运行，而由于会建立文件赋权限是按照UNIX的方式进行，因此会报错：

java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: Cannot run program "ls": CreateProcess error=2,

解决办法是将任务发到运程主机，通常是LINUX上运行，在hbase-site.xml中加入：

<name>mapred.job.tracker</name>

<value>master:9001</value>

</property>

同时需把HDFS的权限机制关掉：

<name>dfs.permissions</name>

<value>false</value>

</property>

另外由于是在远程上执行任务，自定义的类文件，如Maper/Reducer等需打包成jar文件上传，具体见方案：
Hadoop作业提交分析（五）http://www.cnblogs.com/spork/archive/2010/04/21/1717592.html

研究了好几天，终于搞清楚，CONFIGUARATION就是JOB的配置信息，远程JOBTRACKER就是以此为参数构建JOB去执行，由于远程主机并没有自定义的MAPREDUCE类，需打成JAR包后，上传到主机处，但无需每次都手动传，可以代码设置：

conf.set("tmpjars", "d:/aaa.jar");

另注意，如果在WINDOWS系统中，文件分隔号是“；”，生成的JAR包信息是以“；”间隔的，在远程主机的LINUX上是无法辨别，需改为：

System.setProperty("path.separator", ":");

参考文章：
http://www.cnblogs.com/xia520pi/archive/2012/05/20/2510723.html

使用hadoop eclipse plugin提交Job并添加多个第三方jar（完美版）
http://heipark.iteye.com/blog/1171923

posted @ 2012-10-03 02:18 paulwong 阅读(2423) | 评论 (0) | 编辑收藏

ZOOKEEPER资源

ZooKeeper实际上是一个小型的分布式文件系统，外加通知功能。

ZooKeeper典型应用场景一览
http://www.coder4.com/archives/3856

!!!!!ZooKeeper伪分布式集群安装及使用
http://blog.fens.me/hadoop-zookeeper-intro/

!!!ZOOPKEEPER之配置管理、分布式队列、会话、缓存等管理
http://www.cnblogs.com/xguo/category/495322.html

ZooKeeper实现分布式队列Queue
http://blog.fens.me/zookeeper-queue/

ZooKeeper实现分布式FIFO队列
http://blog.fens.me/zookeeper-queue-fifo/

!基于ZooKeeper的分布式Session实现
http://blog.csdn.net/jacktan/article/details/6112806

ZOOPKEEPER和SPRING整合，作为PROPERTY数据的来源
https://github.com/james-wu-shanghai/spring-zookeeper
http://stackoverflow.com/questions/9940476/zookeeper-for-java-spring-config
https://github.com/ryantenney/zookeeper-spring

http://rdc.taobao.com/team/jm/archives/tag/zookeeper

分布式服务框架 Zookeeper -- 管理分布式环境中的数据

http://www.ibm.com/developerworks/cn/opensource/os-cn-zookeeper/

zookeeper

http://baike.baidu.com/view/3061646.htm

为什么要使用ZooKeeper

http://blog.csdn.net/franklysun/article/details/6424213

使用zookeeper管理多个hbase集群

http://koven2049.iteye.com/blog/1150484

Description of how HBase uses ZooKeeper

http://wiki.apache.org/hadoop/ZooKeeper/HBaseUseCases

hadoop+hbase+zookeeper集群安装方法
http://linuxjcq.blog.51cto.com/3042600/760634

http://marysee.blog.51cto.com/1000292/629405

posted @ 2012-10-02 10:20 paulwong 阅读(529) | 评论 (0) | 编辑收藏

HADOOP1.0.3+HBASE0.94.1伪单机环境配置实录

1.在host中加入master 127.0.0.1

2.实现无需密码登录ssh

3.hadoop配置文件

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>

<property>
  <name>hadoop.tmp.dir</name>
  <value>/Users/paul/Documents/PAUL/DOWNLOAD/SOFTWARE/DEVELOP/HADOOP/hadoop-tmp-data</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://master:9000</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

</configuration>

hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>

<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>



</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>

<property>
  <name>mapred.job.tracker</name>
  <value>master:9001</value>
  <description>The host and port that the MapReduce job tracker runs
  at. If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

<property>
<name>mapred.tasktracker.tasks.maximum</name>
<value>8</value>
<description>The maximum number of tasks that will be run simultaneously by a
a task tracker
</description>
</property>

</configuration>

masters/slaves

master

4. 格式化namenode

5. 启动hadoop

6. hbase配置文件

hbase-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://master:9000/hbase</value>
  </property>

  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>localhost</value>
  </property>

</configuration>

7. 启动hbase

posted @ 2012-10-01 22:15 paulwong 阅读(843) | 评论 (0) | 编辑收藏

hudson+maven+sonar+svn 快速搭建持续集成服务

http://www.blogjava.net/Nirvana/archive/2012/09/10/387404.html

http://www.blogjava.net/Nirvana/archive/2012/09/10/387408.html

posted @ 2012-09-26 23:15 paulwong 阅读(715) | 评论 (0) | 编辑收藏

用ab命令来对 JVM进行内存分析的一个例子

1、JVM的启动参数

我是这样设置的：

java -Xmx1024m -Xms1024m -Xss128k -XX:NewRatio=4 -XX:SurvivorRatio=4 -XX:MaxPermSize=16m

启动tomcat之后，使用 jmap -heap `pgrep -u root java`，得到如下信息：

Heap Configuration:

MinHeapFreeRatio = 40

MaxHeapFreeRatio = 70

MaxHeapSize = 1073741824 (1024.0MB)

NewSize = 1048576 (1.0MB)

MaxNewSize = 4294901760 (4095.9375MB)

OldSize = 4194304 (4.0MB)

NewRatio = 4

SurvivorRatio = 4

PermSize = 12582912 (12.0MB)

MaxPermSize = 16777216 (16.0MB)

Heap Usage:

New Generation (Eden + 1 Survivor Space):

capacity = 178913280 (170.625MB)

used = 51533904 (49.14656066894531MB)

free = 127379376 (121.47843933105469MB)

28.80384508070055% used

Eden Space:

capacity = 143130624 (136.5MB)

used = 51533904 (49.14656066894531MB)

free = 91596720 (87.35343933105469MB)

36.00480635087569% used

From Space:

capacity = 35782656 (34.125MB)

used = 0 (0.0MB)

free = 35782656 (34.125MB)

0.0% used

To Space:

capacity = 35782656 (34.125MB)

used = 0 (0.0MB)

free = 35782656 (34.125MB)

0.0% used

tenured generation:

capacity = 859045888 (819.25MB)

used = 1952984 (1.8625106811523438MB)

free = 857092904 (817.3874893188477MB)

0.22734338494383202% used

Perm Generation:

capacity = 12582912 (12.0MB)

used = 6656024 (6.347679138183594MB)

free = 5926888 (5.652320861816406MB)

52.897326151529946% used

------------------------------------------华丽的分割线---------------------------------------

按照这个参数来计算的话（可以参考这里：http://blog.sina.com.cn/s/blog_68158ebf0100wp83.html）

-Xmx1024m -Xms1024m -Xss128k -XX:NewRatio=4 -XX:SurvivorRatio=4 -XX:MaxPermSize=16m

-Xmx1024m 最大堆内存为 1024M

-Xms1024m 初始堆内存为 1024M

-XX:NewRatio=4

则年轻代:年老代=1:4 1024M/5=204.8M

故年轻代=204.8M 年老代=819.2M

-XX:SurvivorRatio=4

则年轻代中 2Survivor:1Eden=2:4 204.8M/6=34.13333333333333M

故 Eden=136.5333333333333M 1Suivivor=34.13333333333333M

用 jmap -heap <pid>

查看的结果与我们计算的结果一致

-----------------------------------华丽的分割线-------------------------------------------

3、编写测试页面

在网站根目录里新建页面perf.jsp，内容如下：

<%intsize = (int)(1024 * 1024 * m);byte[] buffer = new byte[size];Thread.sleep(s);%>

注：m值用来设置每次申请内存的大小，s 表示睡眠多少ms

4、使用jstat来监控内存变化

·jstat命令的用法和介绍，参考这里http://blog.sina.com.cn/s/blog_68158ebf0100woyh.html

这里使用 jstat -gcutil `pgrep -u root java` 1500 10

再解释一下，这里有三个参数：

·pgrep -u root java --> 得到java的进程ID号

·1500 --> 表示每隔1500ms取一次数据

·10 --> 表示一共取10次数据

5、用ab来进行压测

压测的命令：[root@CentOS ~]# ab -c150 -n50000 "http://localhost/perf.jsp?m=1&s=10"

注：这里使用150个线程并发访问，一共访问50000次。

另外我做了apache与tomcat的整合，过程见：http://blog.sina.com.cn/s/blog_68158ebf0100wnvx.html

默认情况下你可以使用 http://localhost:8080/perf.jsp?m=1&s=10 来访问。

--------------------------------------------华丽的分割线----------------------------------------

下面开始进行实验：

·先启动Java内存的监听：

[root@CentOS ~]# jstat -gcutil 8570 1500 10

·在开启一个终端，开始压测：

[root@CentOS ~]# ab -c150 -n50000 "http://localhost/perf.jsp?m=1&s=10"

两个命令结束之后的结果如下：

jstat：

[root@CentOS ~]# jstat -gcutil 8570 1500 10

S0 S1 E O P YGC YGCT FGC FGCT GCT

0.06 0.00 53.15 2.03 67.18 52 0.830 1 0.218 1.048

0.00 0.04 18.46 2.03 67.18 55 0.833 1 0.218 1.052

0.03 0.00 28.94 2.03 67.18 56 0.835 1 0.218 1.053

0.00 0.04 34.02 2.03 67.18 57 0.836 1 0.218 1.054

0.04 0.00 34.13 2.03 67.18 58 0.837 1 0.218 1.055

0.00 0.04 38.62 2.03 67.18 59 0.838 1 0.218 1.056

0.04 0.00 8.39 2.03 67.18 60 0.839 1 0.218 1.058

结果简单解析：

可以看到JVM里S0和S1始终有一个是空的，Eden区达到一定比例之后就会产生Minor GC，由于我这里的Old Generation 区设置的比较大，所以没有产生Full GC。

[root@CentOS ~]# ab -c150 -n50000 "http://localhost/perf.jsp?m=1&s=10"

This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0

Benchmarking localhost (be patient)

Completed 5000 requests

Completed 10000 requests

Completed 15000 requests

Completed 20000 requests

Completed 25000 requests

Completed 30000 requests

Completed 35000 requests

Completed 40000 requests

Completed 45000 requests

Finished 50000 requests

Server Software: Apache/2.2.3

Server Hostname: localhost

Server Port: 80

Document Path: /perf.jsp?m=1&s=10

Document Length: 979 bytes

Concurrency Level: 150

Time taken for tests: 13.467648 seconds

Complete requests: 50000

Failed requests: 0

Write errors: 0

Non-2xx responses: 50005

Total transferred: 57605760 bytes

HTML transferred: 48954895 bytes

Requests per second: 3712.60 [#/sec] (mean)

Time per request: 40.403 [ms] (mean) #平均请求时间

Time per request: 0.269 [ms] (mean, across all concurrent requests)

Transfer rate: 4177.05 [Kbytes/sec] received

Connection Times (ms)

min mean[+/-sd] median max

Connect: 0 1 46.5 0 3701

Processing: 10 38 70.3 36 6885

Waiting: 3 35 70.3 33 6883

Total: 10 39 84.4 37 6901

Percentage of the requests served within a certain time (ms)

50% 37

66% 38

75% 39

80% 39

90% 41

95% 43

98% 50

99% 58

100% 6901 (longest request)

详细的分析见：http://blog.sina.com.cn/s/blog_68158ebf0100woyp.html

posted @ 2012-09-26 22:46 paulwong 阅读(570) | 评论 (0) | 编辑收藏

hadoop优化

网络带宽
Hadoop集群的服务器在规划时就在统一的交换机下，这是在官方文档中建议的部署方式。

但是我们的这台交换机和其他交换机的互联带宽有限，所以在客户端遇到了HDFS访问速度慢的问题。

把操作集群的客户端也联入DataNode的交换机内部，解决了这个问题。
系统参数
对ulimit -c的修改也是官方文档建议的修改，在集群只有10台服务器时，并没有遇到问题。
随着机器增加和任务增加，这个值需要改的更大。
配置文件管理
这个集群用的是Cloudera发行的版本，配置文件默认存在/etc/hadoop/conf位置。这是一个只有root才能修改的位置。

为了修改方便，我把配置文件统一保存在一台机器上，修改后用脚本分发。保证所有服务器都是统一的配置。
mapred.tasktracker.map.tasks.maximum
这个参数控制每个TaskTracker同时运行的Map任务数。

以前的设置是和CPU核数相同的，偶尔遇到任务挤占DataNode资源的问题。

现在改成map+reduce+1==num_cpu_cores。
严格控制root权限
Cloudera的发行版会创建一个hadoop用户，各种守护进程都应该以这个用户运行。

曾经有误操作（/usr/lib/hadoop/bin/hadoop datanode &）导致本地的数据目录被root写入新文件，于是正确启动的hadoop用户进程无法读写。

所以现在的集群服务器不提供日常的root权限访问。
Java的GC模式
在mapred.child.java.opts和HADOOP_OPTS都增加了-XX:+UseConcMarkSweepGC。

JDK的文档中推荐现代多核处理器系统，采用这种GC方式，可以充分利用CPU的并发能力。

这个改动对性能的积极影响很大。
选择正确的JDK
这个集群有部分服务器的JDK用的是32位版本，不能创建-Xmx4g以上的进程。
统一为x64版本的JDK。
mapred.reduce.slowstart.completed.maps
这个参数控制slowstart特性的时机，默认是在5%的map任务完成后，就开始调度reduce进程启动，开始copy过程。

但是我们的机器数量不多，有一次大量的任务堆积在JobTracker里，每个TaskTracker的map和reduce slots都跑满了。

由于map没有足够资源迅速完成，reduce也就无法结束，造成集群的资源互相死锁。
把这个参数改成了0.75，任务堆积的列表从平均10个，变成了3个。
mapred.fairscheduler.preemption
这个参数设为了true。以便fairscheduler在用户最小资源不能满足时，kill其他人的任务腾出足够的资源。

集群运行着各种类型的任务，有些map任务需要运行数小时。这个参数会导致这类任务被频繁kill，几乎无法完成。曾经有个任务在7小时内被kill了137次。

可以通过调整fairscheduler的pool配置解决，给这种任务单独配置一个minMap==maxMap的pool。
mapred.jobtracker.completeuserjobs.maximum
限制每个用户在JobTracker的内存中保存任务的个数。
因为这个参数过大，我们的JobTracker启动不到24小时就会陷入频繁的FullGC当中。

目前改为5，JT平稳运行一天处理1500个任务，只占用800M内存。

这个参数在>0.21.0已经没有必要设置了，因为0.21版本改造了completeuserjobs的用法，会尽快的写入磁盘，不再内存中长期存在了。
mapred.jobtracker.update.faulty.tracker.interval和mapred.jobtracker.max.blacklist.percent
一个写错的任务，会导致一大批TaskTracker进入黑名单，而且要24小时才能恢复。这种状况对中小规模的集群性能影响是非常大的。只能通过手工重启TaskTracker来修复。所以我们就修改了部分JobTracker的代码，暴露了两个参数：

mapred.jobtracker.update.faulty.tracker.interval控制黑名单重置时间，默认是24小时不能改变，我们现在改成了1小时。

mapred.jobtracker.max.blacklist.percent控制进入黑名单TT的比例，我们改成了0.2。
我正在补充这两个参数的TestCase，准备提交到trunk中。
多用hive少用streaming
由于streaming的方便快捷，我们做了很多基于它的开发。但是由于streaming的任务在运行时还要有一个java进程读写stdin/out，有一定的性能开销。

类似的需求最好改用自定义的Deserializer+hive来完成。

posted @ 2012-09-24 23:28 paulwong 阅读(829) | 评论 (0) | 编辑收藏

SPRING BATCH ADMIN安装实录

下载
从云端下载文件：http://s3.amazonaws.com/dist.springframework.org/release/BATCHADM/spring-batch-admin-1.2.1.RELEASE.zip
更改JOBREPOSITRY的数据库
支持从启动JVM时传参数，即如果启动JVM时传了-DENVIRONMENT=mysql值，则读取batch-mysql.properties文件，如未传值，则默认读batch-hsql.properties文件，如找不到此文件，才读取batch-default.properties文件，因此将batch-mysql.properties等删除，只保留default文件，里面放数据库驱动程序等信息是开发环境时的首选做法。相关的properties文件可去http://www.springsource.org/download/community中下载。顺便将JDK改成1.6，SPRING-BATCH的版本改成最新的2.1.8。
生成WAR包
先用MAVEN安装PARENT包，再安装ADMIN包。
部署
打开TOMCAT，将WAR包丢进去即可，如果要在ECLIPSE中调试，将此两个PROJECT 导入到ECLIPSE中，安装RUN-JETTY-RUN插件，即可在ECLIPSE中启动JETTY，而MAVEN项目不用改成ECLIPSE的WEB项目，推荐！
访问网址：http://localhost:8080/spring-batch-admin-sample。
部署JOB
将job的spring配置文件和相关class如itemreader等打成jar包，放到META-INF/spring/batch/jobs/下，则会自动显示到UI中。

<点击下载修改后的控制台> <点击下载例子>

posted @ 2012-09-23 19:36 paulwong 阅读(3329) | 评论 (0) | 编辑收藏

SPRING资源下载

http://www.springsource.com/download/community

posted @ 2012-09-22 12:09 paulwong 阅读(298) | 评论 (0) | 编辑收藏

仅列出标题

My Links

Blog Stats

常用链接

留言簿(67)

随笔分类(1397)

随笔档案(1155)

文章分类(7)

文章档案(10)

相册

收藏夹(2)

AI

Develop

E-BOOK

Other

养生

微服务

搜索

最新评论

阅读排行榜

评论排行榜

[Mac] MAC OSX快捷键大全

Submitting a Hadoop MapReduce job to a remote JobTracker

HBASE的MAPREDUCE任务运行异常解决办法，无需CYGWIN，纯WINDOWS环境

ZOOKEEPER资源

HADOOP1.0.3+HBASE0.94.1伪单机环境配置实录

hudson+maven+sonar+svn 快速搭建持续集成服务

用ab命令来对 JVM进行内存分析的一个例子

hadoop优化

SPRING BATCH ADMIN安装实录

SPRING资源下载