Chan Chen Coding...

Hadoop Note (One) - The Motivation For Hadoop

1. Programming for Traditional Distributed System is Complex
    1.1 Data Exchange Require Synchronization
    1.2 Finite Bandwidth is Available
    1.3 Deal with particular failures of system

2. Data Storage
    Facebook        15PB of Data
    Ebay               5PB of Data

3. Bottleneck
    3.1 Getting the data to processor become the bottleneck
    3.2 Quick Caculation
            -- Typical disk data transfer rate: 75 MB/Sec
            -- Transfer 100 GB of data to the processor: approx 22 min

4. Requirement
    4.1 partial Failure Support
    4.2 Component Recovery
    4.3 Consistency
    4.3 Scalability

5 Hadoop High Level Overview
    5.1 When data is split into the system, it is split into 'block', typically 64Mb or 128 Mb
    5.2 Map task ( the first part of MapReduce System) work on relatively small point of data ( MapParser)
    5.3 A master program allocates work to nodes such that a Map will work on a block of data stored locally on that node.

6 Fault Tolerance
    6.1 If a node fails, the master will detect that failure and reassign the work
    6.2 If a failed node restart, it is automatically and back to the system and assign new task
    6.3 If a node appears to be sunning slowly, the master can redundantly execute another instance of the same task.
    6.4 Restarting a task does not require communication with nodes working on other portions of data

7 Challenge
    7.1 Master Failure

-----------------------------------------------------
Silence, the way to avoid many problems;
Smile, the way to solve many problems;

posted on 2012-06-07 02:30 Chan Chen 阅读(127) 评论(0)  编辑  收藏


只有注册用户登录后才能发表评论。


网站导航:
博客园   IT新闻   Chat2DB   C++博客   博问