1.在host中加入master 127.0.0.1
2.实现无需密码登录ssh
3.hadoop配置文件
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/paul/Documents/PAUL/DOWNLOAD/SOFTWARE/DEVELOP/HADOOP/hadoop-tmp-data</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<!--
<property>
<name>dfs.name.dir</name>
<value>/Users/paul/Documents/PAUL/DOWNLOAD/SOFTWARE/DEVELOP/HADOOP/hadoop-tmp-data/hdfs-data-name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/Users/paul/Documents/PAUL/DOWNLOAD/SOFTWARE/DEVELOP/HADOOP/hadoop-tmp-data/hdfs-data</value>
</property>
-->
</configuration>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.tasktracker.tasks.maximum</name>
<value>8</value>
<description>The maximum number of tasks that will be run simultaneously by a
a task tracker
</description>
</property>
</configuration>
master
4. 格式化namenode
5. 启动hadoop
6. hbase配置文件
hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
* Copyright 2010 The Apache Software Foundation
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/Users/paul/Documents/PAUL/DOWNLOAD/SOFTWARE/DEVELOP/HADOOP/hadoop-tmp-data
*/
-->
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value><!--单机配这个-->
</property>
</configuration>
7. 启动hbase
1、JVM的启动参数
我是这样设置的:
java -Xmx1024m -Xms1024m -Xss128k -XX:NewRatio=4 -XX:SurvivorRatio=4 -XX:MaxPermSize=16m
启动tomcat之后,使用 jmap -heap `pgrep -u root java`,得到如下信息:
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 1073741824 (1024.0MB)
NewSize = 1048576 (1.0MB)
MaxNewSize = 4294901760 (4095.9375MB)
OldSize = 4194304 (4.0MB)
NewRatio = 4
SurvivorRatio = 4
PermSize = 12582912 (12.0MB)
MaxPermSize = 16777216 (16.0MB)
Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 178913280 (170.625MB)
used = 51533904 (49.14656066894531MB)
free = 127379376 (121.47843933105469MB)
28.80384508070055% used
Eden Space:
capacity = 143130624 (136.5MB)
used = 51533904 (49.14656066894531MB)
free = 91596720 (87.35343933105469MB)
36.00480635087569% used
From Space:
capacity = 35782656 (34.125MB)
used = 0 (0.0MB)
free = 35782656 (34.125MB)
0.0% used
To Space:
capacity = 35782656 (34.125MB)
used = 0 (0.0MB)
free = 35782656 (34.125MB)
0.0% used
tenured generation:
capacity = 859045888 (819.25MB)
used = 1952984 (1.8625106811523438MB)
free = 857092904 (817.3874893188477MB)
0.22734338494383202% used
Perm Generation:
capacity = 12582912 (12.0MB)
used = 6656024 (6.347679138183594MB)
free = 5926888 (5.652320861816406MB)
52.897326151529946% used
------------------------------------------华丽的分割线---------------------------------------
-Xmx1024m -Xms1024m -Xss128k -XX:NewRatio=4 -XX:SurvivorRatio=4 -XX:MaxPermSize=16m
-Xmx1024m 最大堆内存为 1024M
-Xms1024m 初始堆内存为 1024M
-XX:NewRatio=4
则 年轻代:年老代=1:4 1024M/5=204.8M
故 年轻代=204.8M 年老代=819.2M
-XX:SurvivorRatio=4
则年轻代中 2Survivor:1Eden=2:4 204.8M/6=34.13333333333333M
故 Eden=136.5333333333333M 1Suivivor=34.13333333333333M
用 jmap -heap <pid>
查看的结果 与我们计算的结果一致
-----------------------------------华丽的分割线-------------------------------------------
3、编写测试页面
在网站根目录里新建页面perf.jsp,内容如下:
<%intsize = (int)(1024 * 1024 * m);byte[] buffer = new byte[size];Thread.sleep(s);%>
注:m值用来设置每次申请内存的大小,s 表示睡眠多少ms
4、使用jstat来监控内存变化
这里使用 jstat -gcutil `pgrep -u root java` 1500 10
再解释一下,这里有三个参数:
·pgrep -u root java --> 得到java的进程ID号
·1500 --> 表示每隔1500ms取一次数据
·10 --> 表示一共取10次数据
5、用ab来进行压测
压测的命令:[root@CentOS ~]# ab -c150 -n50000 "http://localhost/perf.jsp?m=1&s=10"
注:这里使用150个线程并发访问,一共访问50000次。
默认情况下你可以使用 http://localhost:8080/perf.jsp?m=1&s=10 来访问。
--------------------------------------------华丽的分割线----------------------------------------
下面开始进行实验:
·先启动Java内存的监听:
[root@CentOS ~]# jstat -gcutil 8570 1500 10
·在开启一个终端,开始压测:
[root@CentOS ~]# ab -c150 -n50000 "http://localhost/perf.jsp?m=1&s=10"
两个命令结束之后的结果如下:
jstat:
[root@CentOS ~]# jstat -gcutil 8570 1500 10
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.06 0.00 53.15 2.03 67.18 52 0.830 1 0.218 1.048
0.00 0.04 18.46 2.03 67.18 55 0.833 1 0.218 1.052
0.03 0.00 28.94 2.03 67.18 56 0.835 1 0.218 1.053
0.00 0.04 34.02 2.03 67.18 57 0.836 1 0.218 1.054
0.04 0.00 34.13 2.03 67.18 58 0.837 1 0.218 1.055
0.00 0.04 38.62 2.03 67.18 59 0.838 1 0.218 1.056
0.04 0.00 8.39 2.03 67.18 60 0.839 1 0.218 1.058
0.04 0.00 8.39 2.03 67.18 60 0.839 1 0.218 1.058
0.04 0.00 8.39 2.03 67.18 60 0.839 1 0.218 1.058
0.04 0.00 8.39 2.03 67.18 60 0.839 1 0.218 1.058
结果简单解析:
可以看到JVM里S0和S1始终有一个是空的,Eden区达到一定比例之后就会产生Minor GC,由于我这里的Old Generation 区设置的比较大,所以没有产生Full GC。
ab
[root@CentOS ~]# ab -c150 -n50000 "http://localhost/perf.jsp?m=1&s=10"
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Finished 50000 requests
Server Software: Apache/2.2.3
Server Hostname: localhost
Server Port: 80
Document Path: /perf.jsp?m=1&s=10
Document Length: 979 bytes
Concurrency Level: 150
Time taken for tests: 13.467648 seconds
Complete requests: 50000
Failed requests: 0
Write errors: 0
Non-2xx responses: 50005
Total transferred: 57605760 bytes
HTML transferred: 48954895 bytes
Requests per second: 3712.60 [#/sec] (mean)
Time per request: 40.403 [ms] (mean) #平均请求时间
Time per request: 0.269 [ms] (mean, across all concurrent requests)
Transfer rate: 4177.05 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 46.5 0 3701
Processing: 10 38 70.3 36 6885
Waiting: 3 35 70.3 33 6883
Total: 10 39 84.4 37 6901
Percentage of the requests served within a certain time (ms)
50% 37
66% 38
75% 39
80% 39
90% 41
95% 43
98% 50
99% 58
100% 6901 (longest request)
step1:安装JDK
1.1 sudo sh jdk-6u10-linux-i586.bin
1.2 sudo gedit /etc/environment
export JAVA_HOME=/home/linkin/Java/jdk1.6.0_23
export JRE_Home=/home/linkin/Java/jdk1.6.0_23/jre
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
1.3 sudo gedit /etc/profile
在umask 022之前添加以下语句:
export JAVA_HOME=/home/linkin/Java/jdk1.6.0_23
export JRE_HOME=/home/linkin/Java/jdk1.6.0_23/jre
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin
更改时区:
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
安装NTP:
yum install ntp
安装后执行
ntpdate cn.pool.ntp.org
即可同步国际时间..
开机后自动同步时间:
vi /etc/rc.d/rc.local中,最下面添加
ntpdate cn.pool.ntp.org
关闭IPV6
在/etc/sysctl.conf结尾添加
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
重启服务器
删除IPV6的DNS服务器
step2:SSH免密码登陆
2.1 首先在master主机上,linkin
@master :~$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
2.2 linkin
@master :~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys 将id_dsa.pub写入authorized_keys
2.3 linkin
@master :~/.ssh$ scp id_dsa.pub linkin@192.168.149.2:/home/linkin
2.4 登陆到linkin主机 $cat id_dsa.pub >> .ssh/authorized_keys
authorized_keys的权限要是600。chmod 600 .ssh/authorized_keys 2.5 在Datenode上执行同样的操作就能实现彼此无密码登陆
step3:安装hadoop
3.1 设置hadoop-env.sh
export JAVA_HOME=/home/linkin/jdk1.6.0_10
3.2 配置core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/home/linkin/hadoop-0.20.2/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>//要写主机名
</property> 3.3 配置hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property> 3.4 配置mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>//要写主机名
</property>
3.5 配置master和slaves
master:master(主机名)slaves:linkin(主机名)这2个配置文件可以不拷贝到其它机器上,只在master上保存即可。
3.6 配置hosts文件
127.0.0.1 localhost (注意这里不能放其他的如机器名,否则会使hbase的master名称变成localhost) 192.168.149.7 master
192.168.149.2 linkin
3.7 配置profile,在末尾追加以下内容,并输入source/etc/profile使之生效
export JAVA_HOME=/home/linkin/jdk1.6.0_10
export JRE_HOME=/home/linkin/jdk1.6.0_10/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$PATH
HADOOP设置
export HADOOP_HOME=/home/linkin/hadoop-0.20.2
export PATH=$HADOOP_HOME/bin:$PATH
//export PATH=$PATH:$HIVE_HOME/bin
3.8 将hadoop-0.20.2拷贝到其它主机对应的目录下。将/ect/profile和/etc/hosts也拷贝到其它机器上。profile需要做生效操作。
step4 格式化HDFS
bin/hadoop namenode -format
bin/hadoop dfs -ls
step5 启动hadoop
bin/start-all.sh
查看HDFS http://192.168.149.7:50070
查看JOB状态 http://192.168.149.7:50030/jobtracker.jsp
参考资源:
http://wiki.ubuntu.org.cn/%E5%88%A9%E7%94%A8Cloudera%E5%AE%9E%E7%8E%B0Hadoop
如果要读一堆的文本文件到数据库,则可以使用SPRIN BATCH。
主流程:
由JobRunner启动Job,Job启动Step,Step启动TaskLet,TaskLet启动Chunk,Chunk启动ItemRader/ItemProcessor/ItemWriter。
Step之间可以设定流程,即在Step间放一个Decision,在上一步放一个Listener,根据条件把某变量值放到Context中,Decision根据此值决定下一步是哪个。
DefaultLineMapper:将STRING转成MAP
DelimitedLineTokenizer:将行以豆号分割出来放到LIST
BeanWrapperFieldSetMapper:将MAP转成VO
FlatFileItemWriter:输出到文件
DelimitedLineAggregator:对象转字符串
自定义TaskLet:如果任务不是读或写那种,就新增自定义类完成所需工作
http://www.cnblogs.com/gulvzhe/archive/2011/11/06/2238125.html http://www.ibm.com/developerworks/cn/java/j-lo-springbatch1/http://www.visa4uk.fco.gov.uk/Welcome.htm