采用Cloudera版本的hadoop/hbase:
hadoop-0.20.2-cdh3u0
hbase-0.90.1-cdh3u0
zookeeper-3.3.3-cdh3u0
默认已支持FairScheduler调度算法.
只需改配置使期用FairSchedule而非默认的JobQueueTaskScheduler即可.
配置fair-scheduler.xml (/$HADOOP_HOME/conf/):
<?xml version="1.0"?>
<property>
<name>mapred.fairscheduler.allocation.file</name>
<value>[HADOOP_HOME]/conf/fair-scheduler.xml</value>
</property>
<allocations>
<pool name="qiji-task-pool">
<minMaps>5</minMaps>
<minReduces>5</minReduces>
<maxRunningJobs>
<maxRunningJobs>5</maxRunningJobs>
<minSharePreemptionTimeout>300</minSharePreemptionTimeout>
<weight>1.0</weight>
</pool>
<user name="ecap">
<maxRunningJobs>
<maxRunningJobs>6</maxRunningJobs>
</user>
<poolMaxJobsDefault>10</poolMaxJobsDefault>
<userMaxJobsDefault>8</userMaxJobsDefault>
<defaultMinSharePreemptionTimeout>600
</defaultMinSharePreemptionTimeout>
<fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
</allocations>
配置$HADOOP_HOME/conf/mapred-site.xml,最后添加:
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.FairScheduler</value>
</property>
<property>
<name>mapred.fairscheduler.allocation.file</name>
<value>/opt/hadoop/conf/fair-scheduler.xml</value>
</property>
<property>
<name>mapred.fairscheduler.assignmultiple</name>
<value>true</value>
</property>
<property>
<name>mapred.fairscheduler.sizebasedweight</name>
<value>true</value>
</property>
然后重新运行集群,这样有几个Job(上面配置是5个并行)并行运行时,不会因为一个Job把Map/Reduce占满而使其它Job处于Pending状态.
可从: http://<masterip>:50030/scheduler查看并行运行的状态.