ivaneeo's blog

自由的力量,自由的生活。

  BlogJava :: 首页 :: 联系 :: 聚合  :: 管理
  669 Posts :: 0 Stories :: 64 Comments :: 0 Trackbacks

#

在Android的客户端编程中(特别是SNS 类型的客户端),经常需要实现注册功能Activity,要用户输入用户名,密码,邮箱,照片后注册。但这时就有一个问题,在HTML中用form表单就 能实现如上的注册表单,需要的信息会自动封装为完整的HTTP协议,但在Android中如何把这些参数和需要上传的文件封装为HTTP协议呢?

我们可以先做个试验,看一下form表单到底封装了什么样的信息。

第一步:编写一个Servlet,把接收到的HTTP信息保存在一个文件中,代码如下:

  1.     public void doPost(HttpServletRequest request, HttpServletResponse response)
  2.  
  3.            throws ServletException, IOException {
  4.  
  5.  
  6.  
  7.        //获取输入流,是HTTP协议中的实体内容
  8.  
  9.        ServletInputStream  sis=request.getInputStream();
  10.  
  11.      
  12.  
  13.        //缓冲区
  14.  
  15.        byte buffer[]=new byte[1024];
  16.  
  17.       
  18.  
  19.        FileOutputStream fos=new FileOutputStream("d:\\file.log");
  20.  
  21.       
  22.  
  23.        int len=sis.read(buffer, 0, 1024);
  24.  
  25.       
  26.  
  27.        //把流里的信息循环读入到file.log文件中
  28.  
  29.        while( len!=-1 )
  30.  
  31.        {
  32.  
  33.            fos.write(buffer, 0, len);
  34.  
  35.            len=sis.readLine(buffer, 0, 1024);
  36.  
  37.        }
  38.  
  39.       
  40.  
  41.        fos.close();
  42.  
  43.        sis.close();
  44.  
  45.       
  46.  
  47.     }
  48.  
  49.  

第二步:实现如下一个表单页面, 详细的代码如下:

  1. <form action="servlet/ReceiveFile" method="post" enctype="multipart/form-data">
  2.  
  3.     第一个参数<input type="text" name="name1"/> <br/>
  4.  
  5.     第二个参数<input type="text" name="name2"/> <br/>
  6.  
  7.     第一个上传的文件<input type="file" name="file1"/> <br/>
  8.  
  9.     第二个上传的文件<input type="file" name="file2"/> <br/>
  10.  
  11.     <input type="submit" value="提交">
  12.  
  13. </form>

注意了,由于要上传附件,所以一定要设置enctype为multipart/form-data,才可以实现附件的上传。

第三步:填写完信息后按“提交”按钮后,在D盘下查找file.log文件用记事本打开,数据如下:

—————————–7d92221b604bc

Content-Disposition: form-data; name=”name1″

hello

—————————–7d92221b604bc

Content-Disposition: form-data; name=”name2″

world

—————————–7d92221b604bc

Content-Disposition: form-data; name=”file1″; filename=”C:\2.GIF”

Content-Type: image/gif

GIF89a

      €   € €€   €€ € €€€€€览?                                                                                     3  f       3  33 3f 3 3 3 f  f3 ff f f f   ? 檉 櫃 櫶 ?   ? 蘤 虣 烫 ?   3 f   3  3 33 f3 ? ? 33 33333f33?3?33f 3f33ff3f?f?f3 3?3檉3櫃3櫶3?3 3?3蘤3虣3烫3?3 333f3??f  f 3f ff 檉 蘤 f3 f33f3ff3檉3蘤3ff ff3fffff檉f蘤ff f?f檉f櫃f櫶f?f f?f蘤f虣f烫f?f f3fff檉蘤   3 f 櫃 虣 ? ?3?f?櫃3虣3檉 檉3檉f檉櫃f虣f櫃 櫃3櫃f櫃櫃櫶櫃櫶 櫶3櫶f櫶櫃烫櫶? ?3?f?櫃虣   3 f 櫶 烫 ? ?3?f?櫶3烫3蘤 蘤3蘤f蘤櫶f烫f虣 虣3虣f虣櫶櫶虣烫 烫3烫f烫櫶烫烫? ?3?f?櫶烫   3 f ? ? 3 333f3?3?3f f3fff?f?f ?檉櫃櫶??蘤虣烫? 3f??!?   ,   

  e   ??羵Q鸚M!C囑lH馉脝远5荑p釩?3R?R愣?MV39V5?谈re琷?试   3??qn?薵Q燚c?獖i郸EW艗赥戟j ;

—————————–7d92221b604bc

Content-Disposition: form-data; name=”file2″; filename=”C:\2.txt”

Content-Type: text/plain

hello everyone!!!

—————————–7d92221b604bc–

       从表单源码可知,表单上传的数据有4个:参数name1和name2,文件file1和file2

首先从file.log观察两个参数name1和name2的情况。这时候使用UltraEdit打开file.log(因为有些字符在记事本里显示不出来,所以要用16进制编辑器)

       结合16进制数据和记事本显示的数据可知上传参数部分的格式规律:

1.       第一行是“—————————–7d92221b604bc”作为分隔符,然后是“\r\n”(即16进制编辑器显示的0D 0A)回车换行符。

2.       第二行

(1)       首先是HTTP中的扩展头部分“Content-Disposition: form-data;”,表示上传的是表单数据。

(2)       “name=”name1″”参数的名称。

(3)       “\r\n”(即16进制编辑器显示的0D 0A)回车换行符。

3.       第三行:“\r\n”(即16进制编辑器显示的0D 0A)回车换行符。

4.       第四行:参数的值,最后是“\r\n”(即16进制编辑器显示的0D 0A)回车换行符。

由观察可得,表单上传的每个参数都是按照以上1—4的格式构造HTTP协议中的参数部分。

结合16进制数据和记事本显示的数据可知上传文件部分的格式规律:

1.       第一行是“—————————–7d92221b604bc”作为分隔符,然后是“\r\n”(即16进制编辑器显示的0D 0A)回车换行符。

2.       第二行:

a)         首先是HTTP中的扩展头部分“Content-Disposition: form-data;”,表示上传的是表单数据。

b)        “name=”file2″;”参数的名称。

c)        “filename=”C:\2.txt””参数的值。

d)        “\r\n”(即16进制编辑器显示的0D 0A)回车换行符。

3.       第三行:HTTP中的实体头部分“Content-Type: text/plain”:表示所接收到得实体内容的文件格式。计算机的应用中有多种多种通用的文件格式,人们为每种通用格式都定义了一个名称,称为 MIME,MIME的英文全称是”Multipurpose Internet Mail Extensions” (多功能Internet 邮件扩充服务)

4.       第四行:“\r\n”(即16进制编辑器显示的0D 0A)回车换行符。

5.       第五行开始:上传的内容的二进制数。

6.       最后是结束标志“—————————–7d92221b604bc–”,注意:这个结束标志和分隔符的区别是最后多了“–”部分。

但现在还有一个问题,就是分隔符“—————————–7d92221b604bc”是怎么确定的呢?是不是一定要“7d92221b604bc”这串数字?

        我们以前的分析只是观察了HTTP请求的实体部分,可以借用工具观察完整的HTTP请求看一看有没有什么线索?

  在IE下用HttpWatch,在Firefox下用Httpfox这个插件,可以实现网页数据的抓包,从图4可看出,原来在Content-Type部分指定了分隔符所用的字符串。

 
根据以上总结的注册表单中的参数传递和文件上传的规律,我们可以能写出Android中实现一个用户注册功能(包括个人信息填写和上传图片部分)的工具类,

首先,要有一个javaBean类FormFile封装文件的信息:

  1. public class FormFile {
  2.  /* 上传文件的数据 */
  3.  private byte[] data;
  4.  /* 文件名称 */
  5.  private String filname;
  6.  /* 表单字段名称*/
  7.  private String formname;
  8.  /* 内容类型 */
  9.  private String contentType = "application/octet-stream"; //需要查阅相关的资料
  10.  
  11.  public FormFile(String filname, byte[] data, String formname, String contentType) {
  12.   this.data = data;
  13.   this.filname = filname;
  14.   this.formname = formname;
  15.   if(contentType!=null) this.contentType = contentType;
  16.  }
  17.  
  18.  public byte[] getData() {
  19.   return data;
  20.  }
  21.  
  22.  public void setData(byte[] data) {
  23.   this.data = data;
  24.  }
  25.  
  26.  public String getFilname() {
  27.   return filname;
  28.  }
  29.  
  30.  public void setFilname(String filname) {
  31.   this.filname = filname;
  32.  }
  33.  
  34.  public String getFormname() {
  35.   return formname;
  36.  }
  37.  
  38.  public void setFormname(String formname) {
  39.   this.formname = formname;
  40.  }
  41.  
  42.  public String getContentType() {
  43.   return contentType;
  44.  }
  45.  
  46.  public void setContentType(String contentType) {
  47.   this.contentType = contentType;
  48.  }
  49.  
  50. }
  51.  

 
实现文件上传的代码如下:

/** 
 * 直接通过HTTP协议提交数据到服务器,实现表单提交功能 
 * @param actionUrl 上传路径 
 * @param params 请求参数 key为参数名,value为参数值 
 * @param file 上传文件 
 */ 
public static String post(String actionUrl, Map<String, String> params, FormFile[] files) {  
    try {             
        String BOUNDARY = “———7d4a6d158c9″; //数据分隔线  
        String MULTIPART_FORM_DATA = “multipart/form-data”;  
          
        URL url = new URL(actionUrl);  
        HttpURLConnection conn = (HttpURLConnection) url.openConnection();  
        conn.setDoInput(true);//允许输入  
        conn.setDoOutput(true);//允许输出  
        conn.setUseCaches(false);//不使用Cache  
        conn.setRequestMethod(”POST”);            
        conn.setRequestProperty(”Connection”, “Keep-Alive”);  
        conn.setRequestProperty(”Charset”, “UTF-8″);  
        conn.setRequestProperty(”Content-Type”, MULTIPART_FORM_DATA + “; boundary=” + BOUNDARY);  
 
        StringBuilder sb = new StringBuilder();  
          
        //上传的表单参数部分,格式请参考文章  
        for (Map.Entry<String, String> entry : params.entrySet()) {//构建表单字段内容  
            sb.append(”–”);  
            sb.append(BOUNDARY);  
            sb.append(”\r\n”);  
            sb.append(”Content-Disposition: form-data; name=\”"+ entry.getKey() + “\”\r\n\r\n”);  
            sb.append(entry.getValue());  
            sb.append(”\r\n”);  
        }  
        DataOutputStream outStream = new DataOutputStream(conn.getOutputStream());  
        outStream.write(sb.toString().getBytes());//发送表单字段数据  
         
        //上传的文件部分,格式请参考文章  
        for(FormFile file : files){  
            StringBuilder split = new StringBuilder();  
            split.append(”–”);  
            split.append(BOUNDARY);  
            split.append(”\r\n”);  
            split.append(”Content-Disposition: form-data;name=\”"+ file.getFormname()+”\”;filename=\”"+ file.getFilname() + “\”\r\n”);  
            split.append(”Content-Type: “+ file.getContentType()+”\r\n\r\n”);  
            outStream.write(split.toString().getBytes());  
            outStream.write(file.getData(), 0, file.getData().length);  
            outStream.write(”\r\n”.getBytes());  
        }  
        byte[] end_data = (”–” + BOUNDARY + “–\r\n”).getBytes();//数据结束标志           
        outStream.write(end_data);  
        outStream.flush();  
        int cah = conn.getResponseCode();  
        if (cah != 200) throw new RuntimeException(”请求url失败”);  
        InputStream is = conn.getInputStream();  
        int ch;  
        StringBuilder b = new StringBuilder();  
        while( (ch = is.read()) != -1 ){  
            b.append((char)ch);  
        }  
        outStream.close();  
        conn.disconnect();  
        return b.toString();  
    } catch (Exception e) {  
        throw new RuntimeException(e);  
    }  

posted @ 2011-06-09 16:26 ivaneeo 阅读(3319) | 评论 (0)编辑 收藏

一. 相同配置(set....)的 Configuration 可以考虑只在整个 Application 中共享同一个实例: 

Create a configuration instance

First you have to create a freemarker.template.Configuration instance and adjust its settings. A Configuration instance is a central place to store the application level settings of FreeMarker. Also, it deals with the creation and caching of pre-parsed templates.

Probably you will do it only once at the beginning of the application (possibly servlet) life-cycle:

二. 具有不同配置(set....)的 Configuration 应该建立相互独立的实例:

From now you should use this single configuration instance. Note however that if a system has multiple independent components that use FreeMarker, then of course they will use their own private Configuration instance.

三. 共享的 Configuration 实例有利于开启 MRU Cache 功能:

Multithreading

In a multithreaded environment Configuration instances, Template instances and data models should be handled as immutable (read-only) objects. That is, you create and initialize them (for example with set... methods), and then you don't modify them later (e.g. you don't call set...). This allows us to avoid expensive synchronized blocks in a multithreaded environment. Beware with Template instances; when you get a Template instance with Configuration.getTemplate, you may get an instance from the template cache that is already used by other threads, so do not call its set... methods (calling process is of course fine).

The above restrictions do not apply if you access all objects from the same single thread only.

四. 开启 MRU Cache 策略

Template caching

FreeMarker caches templates (assuming you use the Configuration methods to create Template objects). This means that when you call getTemplate, FreeMarker not only returns the resulting Template object, but stores it in a cache, so when next time you call getTemplate with the same (or equivalent) path, it just returns the cached Template instance, and will not load and parse the template file again.

cfg.setCacheStorage(new freemarker.cache.MruCacheStorage(20, 250))  

Or, since MruCacheStorage is the default cache storage implementation:

cfg.setSetting(Configuration.CACHE_STORAGE_KEY, "strong:20, soft:250");  

When you create a new Configuration object, initially it uses an MruCacheStorage where maxStrongSize is 0, and maxSoftSize is Integer.MAX_VALUE (that is, in practice, infinite). But using non-0 maxStrongSize is maybe a better strategy for high load servers, since it seems that, with only softly referenced items, JVM tends to cause just higher resource consumption if the resource consumption was already high, because it constantly throws frequently used templates from the cache, which then have to be re-loaded and and re-parsed.

五. MRU (Most Recently Used) Cache 自动更新模板内容的特性

If you change the template file, then FreeMarker will re-load and re-parse the template automatically when you get the template next time. However, since checking if the file has been changed can be time consuming, there is a Configuration level setting called ``update delay''. This is the time that must elapse since the last checking for a newer version of a certain template before FreeMarker will check that again. This is set to 5 seconds by default. If you want to see the changes of templates immediately, set it to 0. Note that some template loaders may have problems with template updating. For example, class-loader based template loaders typically do not notice that you have changed the template file.

六. MRU Cache 的两级缓存策略

A template will be removed from the cache if you call getTemplate and FreeMarker realizes that the template file has been removed meanwhile. Also, if the JVM thinks that it begins to run out of memory, by default it can arbitrarily drop templates from the cache. Furthermore, you can empty the cache manually with the clearTemplateCache method of Configuration.

The actual strategy of when a cached template should be thrown away is pluggable with the cache_storage setting, by which you can plug any CacheStorage implementation. For most users freemarker.cache.MruCacheStorage will be sufficient. This cache storage implements a two-level Most Recently Used cache. In the first level, items are strongly referenced up to the specified maximum (strongly referenced items can't be dropped by the JVM, as opposed to softly referenced items). When the maximum is exceeded, the least recently used item is moved into the second level cache, where they are softly referenced, up to another specified maximum. The size of the strong and soft parts can be specified with the constructor. For example, set the size of the strong part to 20, and the size of soft part to 250:

posted @ 2011-06-09 15:50 ivaneeo 阅读(722) | 评论 (0)编辑 收藏

今早一来,突然发现使用-put命令往HDFS里传数据传不上去了,抱一大堆错误,然后我使用bin/hadoop dfsadmin -report查看系统状态

admin@adw1:/home/admin/joe.wangh/hadoop-0.19.2>bin/hadoop dfsadmin -report
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: ?%

-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)

使用bin/stop-all.sh关闭HADOOP

admin@adw1:/home/admin/joe.wangh/hadoop-0.19.2>bin/stop-all.sh
stopping jobtracker
172.16.197.192: stopping tasktracker
172.16.197.193: stopping tasktracker
stopping namenode
172.16.197.193: no datanode to stop
172.16.197.192: no datanode to stop

172.16.197.191: stopping secondarynamenode

哦,看到了吧,发现datanode前面并没有启动起来。去DATANODE上查看一下日志

admin@adw2:/home/admin/joe.wangh/hadoop-0.19.2/logs>vi hadoop-admin-datanode-adw2.hst.ali.dw.alidc.net.log

************************************************************/
2010-07-21 10:12:11,987 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /home/admin/joe.wangh/hadoop/data/dfs.data.dir: namenode namespaceID = 898136669; datanode namespaceID = 2127444065
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:233)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:148)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:288)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:206)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1239)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1194)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1202)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1324)
......

错误提示namespaceIDs不一致。

下面给出两种解决办法,我使用的是第二种。

Workaround 1: Start from scratch

I can testify that the following steps solve this error, but the side effects won't make you happy (me neither). The crude workaround I have found is to:

1.     stop the cluster

2.     delete the data directory on the problematic datanode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml; if you followed this tutorial, the relevant directory is /usr/local/hadoop-datastore/hadoop-hadoop/dfs/data

3.     reformat the namenode (NOTE: all HDFS data is lost during this process!)

4.     restart the cluster

When deleting all the HDFS data and starting from scratch does not sound like a good idea (it might be ok during the initial setup/testing), you might give the second approach a try.

Workaround 2: Updating namespaceID of problematic datanodes

Big thanks to Jared Stehler for the following suggestion. I have not tested it myself yet, but feel free to try it out and send me your feedback. This workaround is "minimally invasive" as you only have to edit one file on the problematic datanodes:

1.     stop the datanode

2.     edit the value of namespaceID in <dfs.data.dir>/current/VERSION to match the value of the current namenode

3.     restart the datanode

If you followed the instructions in my tutorials, the full path of the relevant file is /usr/local/hadoop-datastore/hadoop-hadoop/dfs/data/current/VERSION (background: dfs.data.dir is by default set to ${hadoop.tmp.dir}/dfs/data, and we set hadoop.tmp.dir to /usr/local/hadoop-datastore/hadoop-hadoop).

If you wonder how the contents of VERSION look like, here's one of mine:

#contents of <dfs.data.dir>/current/VERSION

namespaceID=393514426

storageID=DS-1706792599-10.10.10.1-50010-1204306713481

cTime=1215607609074

storageType=DATA_NODE

layoutVersion=-13

 

原因:每次namenode format会重新创建一个namenodeId,而tmp/dfs/data下包含了上次format下的id,namenode format清空了namenode下的数据,但是没有晴空datanode下的数据,导致启动时失败,所要做的就是每次fotmat前,清空tmp一下 的所有目录.

posted @ 2011-06-09 14:20 ivaneeo 阅读(552) | 评论 (0)编辑 收藏

  1. private void buildZK() {  
  2.         System.out.println("Build zk client");  
  3.         try {  
  4.             zk = new ZooKeeper(zookeeperConnectionString, 10000, this);  
  5.             Stat s = zk.exists(rootPath, false);  
  6.             if (s == null) {  
  7.                 zk.create(rootPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);  
  8.                 zk.create(rootPath + "/ELECTION", new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);  
  9.             }  
  10.             String value = zk.create(rootPath + "/ELECTION/n_", hostAddress, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);  
  11.         } catch (Exception e) {  
  12.             e.printStackTrace();  
  13.             System.err.println("Error connect to zoo keeper");  
  14.         }  
  15.     }  
  16.   
  17.   
  18.     public void process(WatchedEvent event) {  
  19.         System.out.println(event);  
  20.         if (event.getState() == Event.KeeperState.Disconnected || event.getState() == Event.KeeperState.Expired) {  
  21.             System.out.println("Zookeeper connection timeout.");  
  22.             buildZK();  
  23.         }  
  24.   
  25.     }  
  26.  
posted @ 2011-06-09 13:38 ivaneeo 阅读(449) | 评论 (0)编辑 收藏

修改配置


复制conf/zoo_sample.cfg文件为conf/zoo.cfg,修改其中的数据目录。

# cat /opt/apps/zookeeper/conf/zoo.cfg  tickTime=2000 initLimit=5 syncLimit=2 dataDir=/opt/zkdata clientPort=2181 

相关配置如下:

  • tickTime:这个时间作为Zookeeper服务器之间或者服务器与客户端之间维护心跳的时间,时间单位毫秒。
  • initLimit:选举leader的初始延时。由于服务器启动加载数据需要一定的时间(尤其是配置数据非常多),因此在选举 Leader后立即同步数据前需要一定的时间来完成初始化。可以适当放大一点。延时时间为initLimit*tickTime,也即此数值为 tickTime的次数。
  • syncLimit:此时间表示为Leader与Follower之间的最大响应时间单元,如果超时此时间(syncLimit*tickTime),那么Leader认为Follwer也即死掉,将从服务器列表中删除。

如果是单机模式的话,那么只需要tickTime/dataDir/clientPort三个参数即可,这在单机调试环境很有效。

集群环境配置


增加其他机器的配置

# cat /opt/apps/zookeeper/conf/zoo.cfg  tickTime=2000 initLimit=5 syncLimit=2 dataDir=/opt/zkdata clientPort=2181 server.1=10.11.5.202:2888:3888 server.2=192.168.105.218:2888:3888 server.3=192.168.105.65:2888:3888 

其中server.X的配置是每一个机器的相关参数。X代表唯一序号,例如1/2/3等,值是IP:PORT:PORT。其中IP是 zookeeper服务器的IP地址或者域名,第一个PORT(例如2888)是服务器之间交换数据的端口,也即Follower连接Leader的端 口,而第二个端口(例如3888)是各服务器选举Leader的端口。单机配置集群的话可以通过不同的端口来实现。

同步文件目录

# rsync --inplace -vzrtLp --delete-after --progress /opt/apps/zookeeper root@192.168.105.218:/opt/apps # rsync --inplace -vzrtLp --delete-after --progress /opt/apps/zookeeper root@192.168.106.65:/opt/apps 

建立每一个服务器的id

注意,此id需要和zoo.cfg中的配置对应起来

ssh root@10.11.5.202 'echo 1 > /opt/zkdata/myid' ssh root@192.168.105.218 'echo 2 > /opt/zkdata/myid' ssh root@192.168.106.65 'echo 3 > /opt/zkdata/myid' 

启动服务器


ssh root@10.11.5.202 '/opt/apps/zookeeper/bin/zkServer.sh start' ssh root@192.168.105.218 '/opt/apps/zookeeper/bin/zkServer.sh start' ssh root@192.168.106.65 '/opt/apps/zookeeper/bin/zkServer.sh start' 

防火墙配置


如果开启了iptables防火墙,则需要在文件/etc/sysconfig/iptables文件下增加如下配置

-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2181 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2888 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 3888 -j ACCEPT 

重启防火墙:

service iptables restart 
posted @ 2011-06-08 18:07 ivaneeo 阅读(1193) | 评论 (0)编辑 收藏

最近经常对自己提一些问题,然后自己通过google、读代码、测试寻求答案来解决疑惑,可能这些问题也能给其他人带来一些帮助。

quora是个不错的问答型网站,兴趣去看一下自己感兴趣的话题吧~

1)HBase中的TTL参数什么意思?
TTL == "Time To Live".  You can specify how long a cell lives in hbase.
Onces its "TTL" has expired, its removed.
2)影响read性能的配置参数有哪些?
hbase-env.xml:
export HBASE_HEAPSIZE=4000
hbase-default.xml:
hfile.block.cache.size
3)HBase在写操作的时候会更新LruBlockCache吗?
从代码上看写的时候不会更新lruBlockCache!
4)如何将一个HBase CF指定为IN_MEMORY?
创建table的时候可以指定CF的属性,create 'taobao', {NAME => 'edp', IN_MEMORY => true}
5)HBase cache每次load的最小单位是block
6)如果每次load一个block到cache中,而以后不会再读取这个block,则这个block对block cache
hit ratio没有贡献啊,但是为什么block cache hit ratio有60%+呢?(这个我当初的错误理解,漏
洞还是很多的)
注意block cache hit ratio的最小计量单位应该是record,cache的最小单位才是block, 因为block
下面有很多record,后面的record借助了读第一个record带来的cache福利,所以block cache hit ratio
才会有60%+


7)如果只有一行一个cf,写入很大量的数据会不会发生region split?

  1. <property>  
  2.   <name>hbase.hregion.max.filesize</name>  
  3.   <value>67108864</value>  
  4.   <description>  
  5.   Maximum HStoreFile size. If any one of a column families' HStoreFiles has  
  6.   grown to exceed this value, the hosting HRegion is split in two.  
  7.   Default: 256M.  
  8.   </description>  
  9. </property>  

测试: 将参数hbase.hregion.max.filesize设置成64M以后,然后create table的时候只创建一个CF,测试的时候只往一个row + CF 下面塞入数据,数据量大概在80M左右,在web上显示的数目是107M,但是没有发生region split。这说明region split最小单位应该是row key级别,因为这里只有一个row,即使数据量已经上去了,但是还是没有发生region split.

posted @ 2011-06-08 18:02 ivaneeo 阅读(735) | 评论 (0)编辑 收藏

Hi all.
I've a thinkpad T60 and 9.10 installed. I did some search on the forums and found the workaround with tpb package to fix thinkpad volume buttons issue.
My problems with that fix are:
-tbp package depens on xosd (or whatever like that, NOT Notify-OSD) so the result is not the best...
-tpb package is not neccessary at all, because thinkpad_acpi module can take care about volume buttons as well, you just have to enable the hotkey mask! http://www.thinkwiki.org/wiki/Thinkpad-acpi

So my workaround on T60 (in terminal):
9.04 jaunty:
Code:
sudo echo enable,0x00ffffff > /proc/acpi/ibm/hotkey
9.10 karmic: (using sysfs): (also works on 10.04 and 10.10 as well...)
Code:
sudo cp /sys/devices/platform/thinkpad_acpi/hotkey_all_mask /sys/devices/platform/thinkpad_acpi/hotkey_mask
Update:
The solutions only works till next reboot or suspend/resume cycle.
you should put the commands in:
/etc/rc.local
without sudo of course, to make it permanent.


Please confirm if the solution works on other thikpad models.

As soon as I find solution for all the things I need on my T60 I will put it up on Thinkwiki and paste the link here.
(Active protection - hdaps)
(Trackpoint additional functions - you just have to install the: gpointing-device-settings package)
(fingerprint reader - thinkfinger)

Hope it helped for someone.
posted @ 2011-05-31 15:16 ivaneeo 阅读(300) | 评论 (0)编辑 收藏

Distributed File Systems (DFS) are a new type of file systems which provides some extra features over normal file systems and are used for storing and sharing files across wide area network and provide easy programmatic access. File Systems like HDFS from Hadoop and many others falls in the category of distributed file systems and has been widely used and are quite popular.

This tutorial provides a step by step guide for accessing and using distributed file system for storing and retrieving data using j\Java. Hadoop Distributed File System has been used for this tutorial because it is freely available, easy to setup and is one of the most popular and well known Distributed file system. The tutorial demonstrates how to access Hadoop distributed file system using java showing all the basic operations.

Introduction
Distributed File Systems (DFS) are a new type of file systems which provides some extra features over normal file systems and are used for storing and sharing files across wide area network and provide easy programmatic access. 

Distributed file system is used to make files distributed across multiple servers appear to users as if they reside in one place on the network. Distributed file system allows administrators to consolidate file shares that may exist on multiple servers to appear as if they all are in the same location so that users can access them from a single point on the network. 
HDFS stands for Hadoop Distributed File System and is a distributed file system designed to run on commodity hardware. Some of the features provided by Hadoop are:
•    Fault tolerance: Data can be replicated, so if any of the servers goes down, resources still will be available for user.
•    Resource management and accessibility: Users does not require knowing the physical location of the data; they can access all the resources through a single point. HDFS also provides web browser interface to view the contents of the file.
•    It provides high throughput access to application data.

This tutorial will demonstrate how to use HDFS for basic distributed file system operations using Java. Java 1.6 version and Hadoop driver has been used (link is given in Pre-requisites section). The development environment consists of Eclipse 3.4.2 and Hadoop 0.19.1 on Microsoft Windows XP – SP3.


Pre-requisites

1.      Hadoop-0.19.1 installation - here and here -

2.      Hadoop-0.19.1-core.jar file

3.      Commons-logging-1.1.jar file

4.      Java 1.6

5.      Eclipse 3.4.2



Creating New Project and FileSystem Object

First step is to create a new project in Eclipse and then create a new class in that project. 
Now add all the jar files to the project, as mentioned in the pre-requisites.
First step in using or accessing Hadoop Distributed File System (HDFS) is to create file system object.
Without creating an object you cannot perform any operations on the HDFS, so file system object is always required to be created.
Two input parameters are required to create object. They are “Host name” and “Port”. 
Code below shows how to create file system object to access HDFS. 

Configuration config = new Configuration();

config.set("fs.default.name","hdfs://127.0.0.1:9000/");

FileSystem dfs = FileSystem.get(config);


Here Host name = “127.0.0.1” & Port = “9000”.

Various HDFS operations

Now we will see various operations that can be performed on HDFS.

Creating Directory

Now we will start with creating a directory.
First step for using HDFS is to create a directory where we will store our data. 
Now let us create a directory named “TestDirectory”.

String dirName = "TestDirectory";

Path src = new Path(dfs.getWorkingDirectory()+"/"+dirName);

dfs.mkdirs(src);

Here dfs.getWorkingDirectory() function will return the path of the working directory which is the basic working directory and all the data will be stored inside this directory. mkdirs() function accepts object of the type Path, so as shown above Path object is created first. Directory is required to be created inside basic working directory, so Path object is created accordingly. dfs.mkdirs(src)function will create a directory in the working folder with name “TestDirectory”.

Sub directories can also be created inside the “TestDirectory”; in that case path specified during creation of Path object will change. For example a directory named “subDirectory” can be created inside directory “TestDirectory” as shown in below code.

String subDirName = "subDirectory";

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/"+ subDirName);

dfs.mkdirs(src);

Deleting Directory or file

Existing directory in the HDFS can be deleted. Below code shows how to delete the existing directory.

String dirName = "TestDirectory";

Path src = new Path(dfs.getWorkingDirectory()+"/"+dirName);

Dfs.delete(src);


Please note that delete() method can also be used to delete files. What needs to be deleted should be specified in the Path object.

Copying file to/from HDFS from/to Local file system

Basic aim of using HDFS is to store data, so now we will see how to put data in HDFS.
Once directory is created, required data can be stored in HDFS from the local file system.
So consider that a file named “file1.txt” is located at “E:\HDFS” in the local file system, and it is required to be copied under the folder “subDirectory” (that was created earlier) in HDFS.
Code below shows how to copy file from local file system to HDFS.

Path src = new Path("E://HDFS/file1.txt");

Path dst = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/");

dfs.copyFromLocalFile(src, dst);


Here src and dst are the Path objects created for specifying the local file system path where file is located and HDFS path where file is required to be copied respectively. copyFromLocalFile() method is used for copying file from local file system to HDFS.

Similarly, file can also be copied from HDFS to local file system. Code below shows how to copy file from HDFS to local file system.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file1.txt");

Path dst = new Path("E://HDFS/");

dfs.copyToLocalFile(src, dst);

Here copyToLocalFile() method is used for copying file from HDFS to local file system.

CIO, CTO & Developer Resources

Creating a file and writing data in it

It is also possible to create a file in HDFS and write data in it. So if required instead of directly copying the file from the local file system, a file can be first created and then data can be written in it.
Code below shows how to create a file name “file2.txt” in HDFS directory.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file2.txt");

dfs.createNewFile(src);


Here createNewFile() method will create the file in HDFS based on the input provided in src object.

Now as the file is created, data can be written in it. Code below shows how to write data present in the “file1.txt” of local file system to “file2.txt” of HDFS.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file2.txt");

FileInputStream fis = new FileInputStream("E://HDFS/file1.txt");

int len = fis.available();

byte[] btr = new byte[len];

fis.read(btr);

FSDataOutputStream fs = dfs.create(src);

fs.write(btr);

fs.close();


Here write() method of FSDataOutputStream is used to write data in file located in HDFS.

Reading data from a file

It is always necessary to read the data from file for performing various operations on data. It is possible to read data from the file which is stored in HDFS. 
Code below shows how to retrieve data from the file present in the HDFS. Here data is read from the file (file1.txt) which is present in the directory (subDirectory) that was created earlier.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file1.txt");

FSDataInputStream fs = dfs.open(src);

String str = null;

while ((str = fs.readline())!= null)
{
System.out.println(str);
}


Here readline() method of FSDataInputStream is used to read data from the file located in HDFS. Also src is the Path object used to specify the path of the file in HDFS which has to be read.

Miscellaneous operations that can be performed on HDFS

Below are some of the basic operations that can be performed on HDFS.

Below is the code that can be used to check whether particular file or directory exists in HDFS. If it exists, it returns true and if it doesn’t exists it returns false.dfs.exists() method is used for this.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/HDFS/file1.txt");

System.out.println(dfs.exists(src));

Below is the code that can be used to check the default block size in which file would be split. It returns block size in terms of Number of Bytes.dfs.getDefaultBlockSize() method is used for this.

System.out.println(dfs.getDefaultBlockSize());

To check for the default replication factor, as shown belowdfs.getDefaultReplication() method can be used.

System.out.println(dfs.getDefaultReplication());

To check whether given path is HDFS directory or file, as shown belowdfs.isDirectory() or dfs.isFile() methods can be used.

Path src = new Path(dfs.getWorkingDirectory()+"/TestDirectory/subDirectory/file1.txt");
System.out.println(dfs.isDirectory(src));
System.out.println(dfs.isFile(src));

Conclusion
So we just learned some of the basics about Hadoop Distributed File System, how to create and delete directory, how to copy file to/from HDFS from/to local file system, how to create and delete file into directory, how to write data in file, and how to read data from file. We also learned various other operations that can be performed on HDFS. Thus from what we have done we can say that, HDFS is easy to use for data storage and retrieval.

References:
http://hadoop.apache.org/common/docs/current/hdfs_design.html

http://en.wikipedia.org/wiki/Hadoop

posted @ 2011-05-17 10:43 ivaneeo 阅读(561) | 评论 (0)编辑 收藏

本文主要介绍zookeeper中zookeeper Server leader的选举,zookeeper在选举leader的时候采用了paxos算法(主要是fast paxos),这里主要介绍其中两种:LeaderElection 和FastLeaderElection.

我们先要清楚以下几点

  • 一个Server是如何知道其它的Server


在zookeeper中,一个zookeeper集群有多少个Server是固定,每个Server用于选举的IP和PORT都在配置文件中

  • 除了IP和PORT能标识一个Server外,还有没有别的方法

每一个Server都有一个数字编号,而且是唯一的,我们根据配置文件中的配置来对每一个Server进行编号,这一步在部署时需要人工去做,需要在存储数据文件的目录中创建一个文件叫myid的文件,并写入自己的编号,这个编号在处理我提交的value相同很有用

  • 成为Leader的必要条件

获得n/2 + 1个Server同意(这里意思是n/2 + 1个Server要同意拥有zxid是所有Server最大的哪个Server)

  • zookeeper中选举采用UDP还是TCP

zookeeper中选举主要是采用UDP,也一种实现是采用TCP,在这里介绍的两种实现采用的是UDP

  • zookeeper中有哪几种状态

LOOKING 初始化状态

LEADING  领导者状态

FOLLOWING  跟随者状态

  • 如果所有zxid都相同(例如: 刚初始化时),此时有可能不能形成n/2+1个Server,怎么办

zookeeper中每一个Server都有一个ID,这个ID是不重复的,而且按大小排序,如果遇到这样的情况时,zookeeper就推荐ID最大的哪个Server作为Leader

  • zookeeper中Leader怎么知道Fllower还存活,Fllower怎么知道Leader还存活

Leader定时向Fllower发ping消息,Fllower定时向Leader发ping消息,当发现Leader无法ping通时,就改变自己的状态(LOOKING),发起新的一轮选举

名词解释

zookeeer Server: zookeeper中一个Server,以下简称Server

zxid(zookeeper transtion id): zookeeper 事务id,他是选举过程中能否成为leader的关键因素,它决定当前Server要将自己这一票投给谁(也就是我在选举过程中的value,这只是其中一个,还有id)

myid/id(zookeeper server id): zookeeper server id ,他也是能否成为leader的一个因素

epoch/logicalclock:他主要用于描述leader是否已经改变,每一个Server中启动都会有一个epoch,初始值为0,当 开始新的一次选举时epoch加1,选举完成时 epoch加1。

tag/sequencer:消息编号

xid:随机生成的一个数字,跟epoch功能相同

Fast Paxos消息流向图与Basic Paxos的对比

消息流向图

  • basic paxos 消息流向图
Client   Proposer      Acceptor     Learner
|         |          |  |  |       |  |
X-------->|          |  |  |       |  |  Request
|         X--------->|->|->|       |  |  Prepare(N)//向所有Server提议
|         |<---------X--X--X       |  |  Promise(N,{Va,Vb,Vc})//向提议人回复是否接受提议(如果不接受回到上一步)
|         X--------->|->|->|       |  |  Accept!(N,Vn)//向所有人发送接受提议消息
|         |<---------X--X--X------>|->|  Accepted(N,Vn)//向提议人回复自己已经接受提议)
|<---------------------------------X--X  Response
|         |          |  |  |       |  |
  • fast paxos消息流向图

没有冲突的选举过程

Client    Leader         Acceptor      Learner
|         |          |  |  |  |       |  |
|         X--------->|->|->|->|       |  |  Any(N,I,Recovery)
|         |          |  |  |  |       |  |
X------------------->|->|->|->|       |  |  Accept!(N,I,W)//向所有Server提议,所有Server收到消息后,接受提议
|         |<---------X--X--X--X------>|->|  Accepted(N,I,W)//向提议人发送接受提议的消息
|<------------------------------------X--X  Response(W)
|         |          |  |  |  |       |  |

第一种实现: LeaderElection

LeaderElection是Fast paxos最简单的一种实现,每个Server启动以后都询问其它的Server它要投票给谁,收到所有Server回复以后,就计算出zxid最大的哪个Server,并将这个Server相关信息设置成下一次要投票的Server


每个Server都有一个response线程和选举线程,我们先看一下每个线程是做一些什么事情

response线程

它主要功能是被动的接受对方法的请求,并根据当前自己的状态作出相应的回复,每次回复都有自己的Id,以及xid,我们根据他的状态来看一看他都回复了哪些内容

LOOKING状态:

自己要推荐的Server相关信息(id,zxid)

LEADING状态

myid,上一次推荐的Server的id

FLLOWING状态:

当前Leader的id,以及上一次处理的事务ID(zxid)

选举线程

选举线程由当前Server发起选举的线程担任,他主要的功能对投票结果进行统计,并选出推荐的Server。选举线程首先向所有Server发起一次询问(包括自己),被询问方,根据自己当前的状态作相应的回复,选举线程收到回复后,验证是否是自己发起的询问(验证 xid是否一致),然后获取对方的id(myid),并存储到当前询问对象列表中,最后获取对方提议的leader相关信息(id,zxid),并将这些 信息存储到当次选举的投票记录表中,当向所有Server都询问完以后,对统计结果进行筛选并进行统计,计算出当次询问后获胜的是哪一个 Server,并将当前zxid最大的Server设置为当前Server要推荐的Server(有可能是自己,也有可以是其它的Server,根据投票 结果而定,但是每一个Server在第一次投票时都会投自己),如果此时获胜的Server获得n/2 + 1的Server票数, 设置当前推荐的leader为获胜的Server,将根据获胜的Server相关信息设置自己的状态。每一个Server都重复以上流程,直到选出 leader

了解每个线程的功能以后,我们来看一看选举过程

  • 选举过程中,Server的加入

当一个Server启动时它都会发起一次选举,此时由选举线程发起相关流程,那么每个Server都会获得当前zxid最大的哪个Server是谁,如果当次最大的Server没有获得n/2+1个票数,那么下一次投票时,他将向zxid最大的Server投票,重复以上流程,最后一定能选举出一个Leader

  • 选举过程中,Server的退出

只要保证n/2+1个Server存活就没有任何问题,如果少于n/2+1个Server存活就没办法选出Leader

  • 选举过程中,Leader死亡

当选举出Leader以后,此时每个Server应该是什么状态(FLLOWING)都已经确定,此时由于Leader已经死亡我们就不管它,其它的Fllower按正常的流程继续下去,当完成这个流程以后,所有的Fllower都会向Leader发送Ping消息,如果无法ping通,就改变自己的状态为(FLLOWING ==> LOOKING),发起新的一轮选举

  • 选举完成以后,Leader死亡

这个过程的处理跟选举过程中Leader死亡处理方式一样,这里就不再描述

第二种实现: FastLeaderElection

fastLeaderElection是标准的fast paxos的实现,它首先向所有Server提议自己要成为leader,当其它Server收到提议以后,解决epoch和zxid的冲突,并接受对方的提议,然后向对方发送接受提议完成的消息

数据结构

本地消息结构:

static public class Notification {
long leader;  //所推荐的Server id

long zxid;      //所推荐的Server的zxid(zookeeper transtion id)

long epoch;   //描述leader是否变化(每一个Server启动时都有一个logicalclock,初始值为0)

QuorumPeer.ServerState state;   //发送者当前的状态
InetSocketAddress addr;            //发送者的ip地址
}

网络消息结构:

static public class ToSend {

int type;        //消息类型
long leader;  //Server id
long zxid;     //Server的zxid
long epoch;  //Server的epoch
QuorumPeer.ServerState state; //Server的state
long tag;      //消息编号

InetSocketAddress addr;

}

Server具体的实现

每个Server都一个接收线程池(3个线程)和一个发送线程池 (3个线程),在没有发起选举时,这两个线程池处于阻塞状态,直到有消息到来时才解除阻塞并处理消息,同时每个Server都有一个选举线程(可以发起 选举的线程担任);我们先看一下每个线程所做的事情,如下:

被动接收消息端(接收线程池)的处理:

notification: 首先检测当前Server上所被推荐的zxid,epoch是否合法(currentServer.epoch <= currentMsg.epoch && (currentMsg.zxid > currentServer.zxid || (currentMsg.zxid == currentServer.zxid && currentMsg.id > currentServer.id))) 如果不合法就用消息中的zxid,epoch,id更新当前Server所被推荐的值,此时将收到的消息转换成Notification消息放入接收队列中,将向对方发送ack消息

ack:   将消息编号放入ack队列中,检测对方的状态是否是LOOKING状态,如果不是说明此时已经有Leader已经被选出来,将接收到的消息转发成Notification消息放入接收对队列

主动发送消息端(发送线程池)的处理:

notification: 将要发送的消息由Notification消息转换成ToSend消息,然后发送对方,并等待对方的回复,如果在等待结束没有收到对方法回复,重做三次,如果重做次还是没有收到对方的回复时检测当前的选举(epoch)是否已经改变,如果没有改变,将消息再次放入发送队列中,一直重复直到有Leader选出或者收到对方回复为止

ack: 主要将自己相关信息发送给对方

主动发起选举端(选举线程)的处理:

首先自己的epoch 加1,然后生成notification消息,并将消息放入发送队列中,系统中配置有几个Server就生成几条消息,保证每个Server都能收到此消息,如果当前Server的状态是LOOKING就一直循环检查接收队列是否有消息,如果有消息,根据消息中对方的状态进行相应的处理。

LOOKING状态:

首先检测消息中epoch是否合法,是否比当前Server的大,如果比较当前Server的epoch大时,更新epoch,检测是消息中的zxid,id是否比当前推荐的Server大,如果是更新相关值,并新生成notification消息放入发关队列,清空投票统计表; 如果消息小的epoch则什么也不做; 如果相同检测消息中zxid,id是否合法,如果消息中的zxid,id大,那么更新当前Server相关信息,并新生成notification消息放入发送队列,将收到的消息的IP和投票结果放入统计表中,并计算统计结果,根据结果设置自己相应的状态

LEADING状态:

将收到的消息的IP和投票结果放入统计表中(这里的统计表是独立的),并计算统计结果,根据结果设置自己相应的状态

FOLLOWING状态:

将收到的消息的IP和投票结果放入统计表中(这里的统计表是独立的),并计算统计结果,根据结果设置自己相应的状态

了解每个线程的功能以后,我们来看一看选举过程,选举过程跟第一程一样

  • 选举过程中,Server的加入

当一个Server启动时它都会发起一次选举,此时由选举线程发起相关流程,通过将自己的zxid和epoch告诉其它Server,最后每个Server都会得zxid值最大的哪个Server的相关信息,并且在下一次投票时就投zxid值最大的哪个Server,重复以上流程,最后一定能选举出一个Leader

  • 选举过程中,Server的退出

只要保证n/2+1个Server存活就没有任何问题,如果少于n/2+1个Server存活就没办法选出Leader

  • 选举过程中,Leader死亡

当选举出Leader以后,此时每个Server应该是什么状态 (FLLOWING)都已经确定,此时由于Leader已经死亡我们就不管它,其它的Fllower按正常的流程继续下去,当完成这个流程以后,所有的 Fllower都会向Leader发送Ping消息,如果无法ping通,就改变自己的状态为(FLLOWING ==> LOOKING),发起新的一轮选举

  • 选举完成以后,Leader死亡

这个过程的处理跟选举过 程中Leader死亡处理方式一样,这里就不再描述

posted @ 2011-05-05 13:16 ivaneeo 阅读(1264) | 评论 (1)编辑 收藏

     摘要: zookeeper简介 zookeeper是一个开源分布式的服务,它提供了分布式协作,分布式同步,配置管理等功能. 其实现的功能与google的chubby基本一致.zookeeper的官方网站已经写了一篇非常经典的概述性文章,请大家参阅:ZooKeeper: A Distributed Coordination Service for Distributed Applications 在此我...  阅读全文
posted @ 2011-05-05 13:15 ivaneeo 阅读(1653) | 评论 (0)编辑 收藏

仅列出标题
共67页: First 上一页 13 14 15 16 17 18 19 20 21 下一页 Last