石建 | Fat Mind

2010年11月2日


http://incubator.apache.org/kafka/design.html

1.Why we built this
    asd(activity stream data)数据是任何网站的一部分,反映网站使用情况,如:那些内容被搜索、展示。通常,此部分数据被以log方式记录在文件,然后定期的整合和分析。od(operation data)是关于机器性能数据,和其它不同途径整合的操作数据。
    在近几年,asd和od变成一个网站重要的一部分,更复杂的基础设施是必须的。
     数据特点:
        a、大吞吐量的不变的ad,对实时计算是一个挑战,会很容易超过10倍or100倍。
 
        b、传统的记录log方式是respectable and scalable方式去支持离线处理,但是延迟太高。
    Kafka is intended to be a single queuing platform that can support both offline and online use cases.

2.Major Design Elements

There is a small number of major design decisions that make Kafka different from most other messaging systems:

  1. Kafka is designed for persistent messages as the common case;消息持久
  2. Throughput rather than features are the primary design constraint;吞吐量是第一要求
  3. State about what has been consumed is maintained as part of the consumer not the server;状态由客户端维护
  4. Kafka is explicitly distributed. It is assumed that producers, brokers, and consumers are all spread over multiple machines;必须是分布式
3.Basics
    Messages are the fundamental unit of communication
    Messages are
 published to a topic by a producer which means they are physically sent to a server acting as a broker,消息被生产者发布到一个topic,意味着物理的发送消息到broker;
    多个consumer订阅一个topic,则此topic的每个消息都会被分发到每个consumer;
    kafka是分布式:producer、broker、consumer,均可以由集群的多台机器组成,相互协作 a logic group;
    属于同一个consumer group的每一个consumer process,每个消息能准确的由其中的一个process消费;A more common case in our own usage is that we have multiple logical consumer groups, each consisting of a cluster of consuming machines that act as a logical whole.
    kafka不管一个topic有多少个consumer,其消息仅会存储一份。

4.Message Persistence and Caching

4.1 Don't fear the filesystem !
    kafka完全依赖文件系统去存储和cache消息;
    大家通常对磁盘的直觉是'很慢',则使人们对持久化结构,是否能提供有竞争力的性能表示怀疑;实际上,磁盘到底有多慢或多块,完全取决于如何使用磁盘,a properly designed disk structure can often be as fast as the network.
    http://baike.baidu.com/view/969385.htm raid-5 
    http://www.china001.com/show_hdr.php?xname=PPDDMV0&dname=66IP341&xpos=172 磁盘种类
    磁盘顺序读写的性能非常高, linear writes on a 6 7200rpm SATA RAID-5 array is about 300MB/sec;These linear reads and writes are the most predictable of all usage patterns, and hence the one detected and optimized best by the operating system using read-ahead and write-behind techniques。顺序读写是最可预见的模式,因此操作系统通过read-head和write-behind技术去优化。
    现代操作系统,用mem作为disk的cache;Any modern OS will happily divert all free memory to disk caching with little performance penalty when the memory is reclaimed. All disk reads and writes will go through this unified cache. 
    Jvm:a、对象的内存开销是非常大的,通常是数据存储的2倍;b、当heap数据增大时,gc代价越来越大;
    As a result of these factors using the filesystem and relying on pagecache is superior to maintaining an in-memory cache or other structure。依赖文件系统和pagecache是优于mem cahce或其它结构的。
    数据压缩,Doing so will result in a cache of up to 28-30GB on a 32GB machine without GC penalties. 
    This suggests a design which is very simple: maintain as much as possible in-memory and flush to the filesystem only when necessary. 尽可能的维持在内存中,仅当必须时写回到文件系统.
    当数据被立即写回到持久化的文件,而未调用flush,其意味着数据仅被写入到os pagecahe,在后续某个时间由os flush。Then we add a configuration driven flush policy to allow the user of the system to control how often data is flushed to the physical disk (every N messages or every M seconds) to put a bound on the amount of data "at risk" in the event of a hard crash. 提供flush策略。

4.2 
Constant Time Suffices
    
The persistent data structure used in messaging systems metadata is often a BTree. BTrees are the most versatile data structure available, and make it possible to support a wide variety of transactional and non-transactional semantics in the messaging system.
    Disk seeks come at 10 ms a pop, and each disk can do only one seek at a time so parallelism is limited. Hence even a handful of disk seeks leads to very high overhead. 
    Furthermore BTrees require a very sophisticated page or row locking implementation to avoid locking the entire tree on each operation.
The implementation must pay a fairly high price for row-locking or else effectively serialize all reads.
    持久化消息的元数据通常是BTree结构,但磁盘结构,其代价太大。原因:寻道、避免锁整棵树。
    
Intuitively a persistent queue could be built on simple reads and appends to files as is commonly the case with logging solutions.
    持久化队列可以构建在读和append to 文件。所以不支持BTree的一些语义,但其好处是:O(1)消耗,无锁读写。
    
the performance is completely decoupled from the data size--one server can now take full advantage of a number of cheap, low-rotational speed 1+TB SATA drives. 
Though they have poor seek performance, these drives often have comparable performance for large reads and writes at 1/3 the price and 3x the capacity.

4.3 Maximizing Efficiency
    Furthermore we assume each message published is read at least once (and often multiple times), hence we optimize for consumption rather than production. 更进一步,我们假设被发布的消息至少会读一次,因此优化consumer优先于producer。
    
There are two common causes of inefficiency :
        two many network requests, (
 APIs are built around a "message set" abstraction,
This allows network requests to group messages together and amortize the overhead of the network roundtrip rather than sending a single message at a time.) 仅提供批量操作api,则每次网络开销是平分在一组消息,而不是单个消息。
    and excessive byte copying.(
The message log maintained by the broker is itself just a directory of message sets that have been written to disk.
Maintaining this common format allows optimization of the most important operation : network transfer of persistent log chunks.
    To understand the impact of sendfile, it is important to understand the common data path for transfer of data from file to socket:
  1. The operating system reads data from the disk into pagecache in kernel space
  2. The application reads the data from kernel space into a user-space buffer
  3. The application writes the data back into kernel space into a socket buffer
  4. The operating system copies the data from the socket buffer to the NIC buffer where it is sent over the network
    利用os提供的zero-copy,
only the final copy to the NIC buffer is needed.

4.4 End-to-end Batch Compression
    In many cases the bottleneck is actually not CPU but network. This is particularly true for a data pipeline that needs to send messages across data centers.
Efficient compression requires compressing multiple messages together rather than compressing each message individually. 
Ideally this would be possible in an end-to-end fashion — that is, data would be compressed prior to sending by the producer and remain compressed on the server, only being decompressed by the eventual consumers. 
    
A batch of messages can be clumped together compressed and sent to the server in this form. This batch of messages will be delivered all to the same consumer and will remain in compressed form until it arrives there.
    理解:kafka 
producer api 提供批量压缩,broker不对此批消息做任何操作,且以压缩的方式,一起被发送到consumer。

4.5 Consumer state
    Keeping track of what has been consumed is one of the key things a messaging system must provide. 
State tracking requires updating a persistent entity and potentially causes random accesses. 
    
Most messaging systems keep metadata about what messages have been consumed on the broker. That is, as a message is handed out to a consumer, the broker records that fact locally. 大部分消息系统,存储是否被消费的元信息在broker。则是说,一个消息被分发到一个consumer,broker记录。
    问题:当consumer消费失败后,会导致消息丢失;改进:每次consumer消费后,给broker ack,若broker在超时时间未收到ack,则重发此消息。
    问题:1.当消费成功,但未ack时,会导致消费2次  2.
 now the broker must keep multiple states about every single message  3.当broker是多台机器时,则状态之间需要同步

4.5.1 Message delivery semantics
    
So clearly there are multiple possible message delivery guarantees that could be provided : at most once 、at least once、exactly once。
    
This problem is heavily studied, and is a variation of the "transaction commit" problem. Algorithms that provide exactly once semantics exist, two- or three-phase commits and Paxos variants being examples, but they come with some drawbacks. They typically require multiple round trips and may have poor guarantees of liveness (they can halt indefinitely). 
    消费分发语义,是 ‘事务提交’ 问题的变种。算法提供 exactly onece 语义,两阶段 or 三阶段提交,paxos 均是例子,但它们存在缺点。典型的问题是要求多次round trip,且
poor guarantees of liveness。
    
Kafka does two unusual things with respect to metadata. 
First the stream is partitioned on the brokers into a set of distinct partitions. 
Within a partition messages are stored in the order in which they arrive at the broker, and will be given out to consumers in that same order. This means that rather than store metadata for each message (marking it as consumed, say), we just need to store the "high water mark" for each combination of consumer, topic, and partition.  
    
4.5.2 
Consumer state
    In Kafka, the consumers are responsible for maintaining state information (offset) on what has been consumed. 
Typically, the Kafka consumer library writes their state data to zookeeper.
    
This solves a distributed consensus problem, by removing the distributed part!
    
There is a side benefit of this decision. A consumer can deliberately rewind back to an old offset and re-consume data.

4.5.3 Push vs. pull
    
A related question is whether consumers should pull data from brokers or brokers should push data to the subscriber.
There are pros and cons to both approaches.
    However a push-based system has difficulty dealing with diverse consumers as the broker controls the rate at which data is transferred. push目标是consumer能在最大速率去消费,可不幸的是,当consume速率小于生产速率时,the consumer tends to be overwhelmed。
    
A pull-based system has the nicer property that the consumer simply falls behind and catches up when it can. This can be mitigated with some kind of backoff protocol by which the consumer can indicate it is overwhelmed, but getting the rate of transfer to fully utilize (but never over-utilize) the consumer is trickier than it seems. Previous attempts at building systems in this fashion led us to go with a more traditional pull model.  不存在push问题,且也保证充分利用consumer能力。

5. Distribution
    Kafka is built to be run across a cluster of machines as the common case. There is no central "master" node. Brokers are peers to each other and can be added and removed at anytime without any manual configuration changes. Similarly, producers and consumers can be started dynamically at any time. Each broker registers some metadata (e.g., available topics) in Zookeeper. Producers and consumers can use Zookeeper to discover topics and to co-ordinate the production and consumption. The details of producers and consumers will be described below.

6. Producer

6.1 Automatic producer load balancing
    Kafka supports client-side load balancing for message producers or use of a dedicated load balancer to balance TCP connections. 
 
    The advantage of using a level-4 load balancer is that each producer only needs a single TCP connection, and no connection to zookeeper is needed. 
The disadvantage is that the balancing is done at the TCP connection level, and hence it may not be well balanced (if some producers produce many more messages then others, evenly dividing up the connections per broker may not result in evenly dividing up the messages per broker).
    
Client-side zookeeper-based load balancing solves some of these problems. It allows the producer to dynamically discover new brokers, and balance load on a per-request basis. It allows the producer to partition data according to some key instead of randomly.

    The working of the zookeeper-based load balancing is described below. Zookeeper watchers are registered on the following events—

  • a new broker comes up
  • a broker goes down
  • a new topic is registered
  • a broker gets registered for an existing topic

    Internally, the producer maintains an elastic pool of connections to the brokers, one per broker. This pool is kept updated to establish/maintain connections to all the live brokers, through the zookeeper watcher callbacks. When a producer request for a particular topic comes in, a broker partition is picked by the partitioner (see section on semantic partitioning). The available producer connection is used from the pool to send the data to the selected broker partition.
    producer通过zk,管理与broker的连接。当一个请求,根据partition rule 计算分区,从连接池选择对应的connection,发送数据。

6.2 Asynchronous send

    Asynchronous non-blocking operations are fundamental to scaling messaging systems.
    
This allows buffering of produce requests in a in-memory queue and batch sends that are triggered by a time interval or a pre-configured batch size. 

6.3 Semantic partitioning
    
The producer has the capability to be able to semantically map messages to the available kafka nodes and partitions. 
This allows partitioning the stream of messages with some semantic partition function based on some key in the message to spread them over broker machines. 


posted @ 2013-07-06 14:57 石建 | Fat Mind 阅读(1749) | 评论 (0)编辑 收藏

1.Js代码,login.js文件

//用户的登陆信息写入cookies
function SetCookie(form)//两个参数,一个是cookie的名子,一个是值
{   
    var name = form.name.value;
    var password = form.password.value;
    var Days = 1; //此 cookie 将被保存 7 天 
    var exp  = new Date(); //生成一个现在的日期,加上保存期限,然后设置cookie的生存期限!
    exp.setTime(exp.getTime() + Days*24*60*60*1000);
    document.cookie = "user="+ escape(name) + "/" + escape(password) + ";expires=" + exp.toGMTString();
}
//取cookies函数--正则表达式(不会,学习正则表达式)  
function getCookie(name)      
{
    var arr = document.cookie.match(new RegExp("(^| )"+name+"=([^;]*)(;|$)"));
    if(arr != nullreturn unescape(arr[2]); 
    return null;
}
//取cookies函数--普通实现      
  function   readCookie(form){   
      var   cookieValue   =   "";   
      var   search   =   "user=";   
      if(document.cookie.length   >   0)     {   
          offset   =   document.cookie.indexOf(search);
          if(offset !=  -1){     
              offset   +=   search.length;   
              end   =   document.cookie.indexOf(";",offset);   
              if   (end  ==  -1)   
                    end   =   document.cookie.length;
              //获取cookies里面的值          
              cookieValue   =   unescape(document.cookie.substring(offset,end))
              if(cookieValue != null){
                    var str = cookieValue.split("/");
                    form.name.value = str[0];
                    form.password.value = str[1]; 
              }
          }   
      }    
  }   
//删除cookie,(servlet里面:设置时间为0,设置为-1和session的范围是一样的),javascript好像是有点区别
function delCookie()
{
    var name = "admin";
    var exp = new Date();
    exp.setTime(exp.getTime() - 1);
    var cval=getCookie(name);
    if(cval!=null) document.cookie= name + "="+cval+";expires="+exp.toGMTString();
}

 

2.jsp代码,文件login.jsp

<%@ page contentType="text/html; charset=gb2312" language="java"
    import="java.sql.*" errorPage=""%>
    
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=gb2312">
        <title>javascript 控制 cookie</title>
        <link href="css/style.css" rel="stylesheet" type="text/css">
        <script type="text/javascript" src="js/login.js"></script>
    </head>
    <script language="javascript">
    function checkEmpty(form){
        for(i=0;i<form.length;i++){
            if(form.elements[i].value==""){
                alert("表单信息不能为空");
                return false;
            }
        }
    }
</script>
    <body onload="readCookie(form)"> <!-- 作为JavaScript控制的cookie-->
        <div align="center">
            <table width="324" height="225" border="0" cellpadding="0" cellspacing="0">
                <tr height="50">
                    <td ></td>
                </tr>
                <tr align="center">
                    <td background="images/back.jpg">
                        <br>
                        <br>
                        登陆
                        <form name="form" method="post" action="" onSubmit="return checkEmpty(form)">
                            <input type="hidden" name="id" value="-1">
                            <table width="268" border="1" cellpadding="0" cellspacing="0">
                                <tr align="center">
                                    <td width="63" height="30">
                                        用户名:
                                    </td>
                                    <td width="199">
                                        <input type="text" name="name" id="name">
                                    </td>
                                </tr>
                                <tr align="center">
                                    <td height="30">
                                        密码:
                                    </td>
                                    <td>
                                        <input type="password" name="password" id="password">
                                    </td>
                                </tr>
                            </table>
                            <br>
                            <input type="submit" value="提交">
                            <input type="checkbox" name="cookie" onclick="SetCookie(form)">记住我          
                        </form>
                    </td>
                </tr>
            </table>
        </div>
    </body>
</html>

 


目的:当你再次打开login.jsp页面,表单里面的内容已经写好了,是你上一次的登陆信息!


问题:1.JavaScript里面取cookie都是写死的,不是很灵活!
            2.JavaScript的cookie是按照字符串的形式存放的,所以拿出的时候,你要按照你放进去的形式来选择!
            3.本来是想实现自动登陆的,可我的每个页面都要session的检查!一个客户端,一个服务器端,没能实现!

 

 

posted @ 2012-09-09 15:18 石建 | Fat Mind 阅读(629) | 评论 (0)编辑 收藏
1.变量类型
  - undefined
  - null
  - string
        - == 与 === 区别
  - number
  - boolean
  - string、number、boolean均有对应的 '对象类'
2.函数
  - 定义函数
        - function 关键字
        - 参数(见例子),arguments
        - 函数内变量声明,var区别
  - 作用域
        - 链式结构(子函数可以看见父函数的变量)
  - 匿名函数
      - 使用场景(非复用场景,如:jsonp回调函数)
      - this特征
例子:
var add = function(x) {
    return x++;

}
add(1,2,3); // 参数可以随意多个,类似Java中的(int x ...)

var fn = function(name, pass) {
    alert(name);
    alert(pass);
};
fn("hello","1234",5); // 按照传递的顺序排列


var name = "windows";
var fn = function() {
    var name 
= "hello";
    alert(
this.name);
}
fn(); // windows,this在匿名函数内部是指向windows范围

var name = "windows";
var fn = function() {
    name 
= "hello";
    alert(
this.name);
}
fn(); // 因函数内部变量name未声明为var,则属于全局变量,且this指向windows,则为'hello'

function add(a) {
    return ++a;
}
var fn = function(x,add){
    return add(x);
}
fn(1, add);  // 函数作为参数

3.闭包  
http://www.ruanyifeng.com/blog/2009/08/learning_javascript_closures.html 【good】
其它语言闭包概念 http://www.ibm.com/developerworks/cn/linux/l-cn-closure/

4.对象
    - new Object()
    – 对象字面量
    – 构造函数
    - 上述操作,经历的步骤
        –创建新对象
        –将构造方法的作用域赋给新对象(new 操作符)
        –为对象添加属性, 方法
        –返回该对象

var obj = new Object();  // new Object方式
obj.name = 'zhangsan';

var obj = {                   // 字面常量方式,定义对象
    name : 'zhangsan',
    showName : function (){
        alert(this.name);
    }
};
alert(obj.showName());
function Person(name) { // 构造函数
    this.name = name;
    this.showName = function(){
        return this.name; }
    };
var obj = new Person("zhangsan"); // 必须用 new 关键,否则等于调用一个普通函数
obj.showName();
alert(obj.name);


资料:内部培训ppt 
posted @ 2012-05-20 13:50 石建 | Fat Mind 阅读(255) | 评论 (0)编辑 收藏


1.句柄就是一个标识符,只要获得对象的句柄,我们就可以对对象进行任意的操作。

2.句柄不是指针,操作系统用句柄可以找到一块内存,这个句柄可能是标识符,mapkey,也可能是指针,看操作系统怎么处理的了。

fd算是在某种程度上替代句柄吧;

Linux 有相应机制,但没有统一的句柄类型,各种类型的系统资源由各自的类型来标识,由各自的接口操作。

3.http://tech.ddvip.com/2009-06/1244006580122204_11.html

在操作系统层面上,文件操作也有类似于FILE的一个概念,在Linux里,这叫做文件描述符(File Descriptor),而在Windows里,叫做句柄(Handle)(以下在没有歧义的时候统称为句柄)。用户通过某个函数打开文件以获得句柄,此 后用户操纵文件皆通过该句柄进行。

 

设计这么一个句柄的原因在于句柄可以防止用户随意读写操作系统内核的文件对象。无论是Linux还是Windows,文件句柄总是和内核的文件对象相关联的,但如何关联细节用户并不可见。内核可以通过句柄来计算出内核里文件对象的地址,但此能力并不对用户开放。

 

下面举一个实际的例子,在Linux中,值为012fd分别代表标准输入、标准输出和标准错误输出。在程序中打开文件得到的fd3开始增长。 fd具体是什么呢?在内核中,每一个进程都有一个私有的打开文件表,这个表是一个指针数组,每一个元素都指向一个内核的打开文件对象。而fd,就是这 个表的下标。当用户打开一个文件时,内核会在内部生成一个打开文件对象,并在这个表里找到一个空项,让这一项指向生成的打开文件对象,并返回这一项的下标 作为fd。由于这个表处于内核,并且用户无法访问到,因此用户即使拥有fd,也无法得到打开文件对象的地址,只能够通过系统提供的函数来操作。

 

C语言里,操纵文件的渠道则是FILE结构,不难想象,C语言中的FILE结构必定和fd有一对一的关系,每个FILE结构都会记录自己唯一对应的fd


句柄 
http://zh.wikipedia.org/wiki/%E5%8F%A5%E6%9F%84

程序设计 ,句柄是一种特殊的智能指针 。当一个应用程序 要引用其他系统(数据库操作系统 )所管理的内存 块或对象 时,就要使用句柄。

句柄与普通指针 的区别在于,指针包含的是引用对象 内存地址 ,而句柄则是由系统所管理的引用标识,该标识可以被系统重新定位到一个内存地址 上。这种间接访问对象 的模式增强了系统对引用对象 的控制。(参见封装 )

在上世纪80年代的操作系统(如Mac OS Windows )的内存管理 中,句柄被广泛应用。Unix 系统的文件描述符 基本上也属于句柄。和其它桌面环境 一样,Windows API 大量使用句柄来标识系统中的对象 ,并建立操作系统与用户空间 之间的通信渠道。例如,桌面上的一个窗体由一个HWND 类型的句柄来标识。

如今,内存 容量的增大和虚拟内存 算法使得更简单的指针 愈加受到青睐,而指向另一指针的那类句柄受到冷淡。尽管如此,许多操作系统 仍然把指向私有对象 的指针以及进程传递给客户端 的内部数组 下标称为句柄。


 

posted @ 2012-04-06 14:02 石建 | Fat Mind 阅读(11854) | 评论 (0)编辑 收藏

官方 :http://code.google.com/p/powermock/ 

1. 使用mockito的同学,推荐阅读如下部分

    - document [必选]
        - getting started
        - motavition
    - mockito extends [必选]
        - mockito 1.8+ useage
    - common
    - tutorial
    - faq [必选]


2. 附件:实际开发中使用到的
powermock的一些特性,简化后的例子(仅为说明powermock api使用)。主要包括

 

-          修改私有域

-          私有方法

-            测试私有方法

-            Mock

-            Verify

-          静态方法

-            Mock

-            抛出异常

-            Verify

-          Mock类部分方法

-          Mock Java core library,如:Thread

-          Mock 构造器

/Files/shijian/powermock.rar



posted @ 2012-03-29 12:39 石建 | Fat Mind 阅读(894) | 评论 (0)编辑 收藏


原文地址:
http://googleresearch.blogspot.com/2012/03/excellent-papers-for-2011.html

 

Excellent Papers for 2011

Posted by Corinna Cortes and Alfred Spector, Google Research

Googlers across the company actively engage with the scientific community by publishing technical papers, contributing open-source packages, working on standards, introducing new APIs and tools, giving talks and presentations, participating in ongoing technical debates, and much more. Our publications offer technical and algorithmic advances, feature aspects we learn as we develop novel products and services, and shed light on some of the technical challenges we face at Google.

 

谷歌公司积极参与科学界的交流,通过发表技术论文,贡献开源软件,制定标准,引入新的API和工具,举办讲座和演讲,参加正在进行的技术辩论,等等。我们发布的文章提供技术和算法的进步,在开发新的产品和服务过程中学习到的内容,揭示一些我们在谷歌所面临的技术挑战。

 

In an effort to highlight some of our work, we periodically select a number of publications to be featured on this blog. We first posted a set of papers on this blog in mid-2010 and subsequently discussed them in more detail in the following blog postings. In a second round, we highlighted new noteworthy papers from the later half of 2010. This time we honor the influential papers authored or co-authored by Googlers covering all of 2011 -- covering roughly 10% of our total publications.  It’s tough choosing, so we may have left out some important papers.  So, do see the publications list to review the complete group.

 

为了彰显我们的一些工作,我们定期选择一些列文章发布在blog2010中期,我们第一次发布了一些列的文章在blog,并随后在博客文章中更详细讨论它们。在第二轮中,我们强调从2010年下半年新值得注意的论文。这一次,我们给有影响力的文章的作者或合著者以荣誉,大约占总文章数的10%。这是艰难的选择的,所以我们可能已经遗漏了一些重要文章。因此,请看完整的文章清单。

 

In the coming weeks we will be offering a more in-depth look at these publications, but here are some summaries:

 

在未来几周我们将更深入的谈论这些论文,但现在只做一些总结。

 

Audio processing

 

Cascades of two-pole–two-zero asymmetric resonators are good models of peripheral auditory function”, Richard F. Lyon,Journal of the Acoustical Society of America, vol. 130 (2011), pp. 3893-3904.
Lyon's long title summarizes a result that he has been working toward over many years of modeling sound processing in the inner ear. 
 This nonlinear cochlear model is shown to be "good" with respect to psychophysical data on masking, physiological data on mechanical and neural response, and computational efficiency. These properties derive from the close connection between wave propagation and filter cascades. This filter-cascade model of the ear is used as an efficient sound processor for several machine hearing projects at Google.

 

声音处理:这个滤波器级联模型的耳朵是用来作为一种高效的声音处理器,是谷歌的几个机器声音处理项目之一。

 

Electronic Commerce and Algorithms

 

Online Vertex-Weighted Bipartite Matching and Single-bid Budgeted Allocations”, Gagan AggarwalGagan GoelChinmay KarandeAranyak MehtaSODA 2011.
The authors introduce an elegant and powerful algorithmic technique to the area of online ad allocation and matching: a hybrid of random perturbations and greedy choice to make decisions on the fly. Their technique sheds new light on classic matching algorithms, and can be used, for example, to pick one among a set of relevant ads, without knowing in advance the demand for ad slots on future web page views. 

 

作者介绍在线广告分配和匹配方面的优雅和强大的算法技术:一种混合随机扰动和贪婪选择,实现在线决定。他们的技术揭示了经典的匹配算法的新的方向,例如,挑选其中一组相关的广告,事先不知道未来的网站页面访问的广告位置的需求。【关注】

 

Milgram-routing in social networks”, Silvio Lattanzi, Alessandro Panconesi, D. Sivakumar, Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 725-734.
Milgram’s "six-degrees-of-separation experiment" and the fascinating small world hypothesis that follows from it, have generated a lot of interesting research in recent years. In this landmark experiment, Milgram showed that people unknown to each other are often connected by surprisingly short chains of acquaintances. In the paper we prove theoretically and experimentally how a recent model of social networks, "Affiliation Networks", offers an explanation to this phenomena and inspires interesting technique for local routing within social networks.

 

米尔格兰姆的六个度分离实验,迷人的小世界遵从它的结果,在最近几年已经产生了很多有趣的研究。在这一具有里程碑意义的实验,表明未知的对方往往是通过熟人,以令人惊讶的短链连接即可认识。在本文中,我们提供理论和实验关于近代的社会网络模型,Affiliation Networks,提供了一种解释这种现象,并激发社会网络的interesting technique for local routing。【关注】

 

Non-Price Equilibria in Markets of Discrete Goods”, Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Noam Nisan, EC, 2011.
We present a correspondence between markets of indivisible items, and a family of auction based n player games. We show that a market has a price based (Walrasian) equilibrium if and only if the corresponding game has a pure Nash equilibrium. We then turn to markets which do not have a Walrasian equilibrium (which is the interesting case), and study properties of the mixed Nash equilibria of the corresponding games.

 

在离散商品市场的非价格平衡【关注】

 

HCI

 

From Basecamp to Summit: Scaling Field Research Across 9 Locations”, Jens Riegelsberger, Audrey Yang, Konstantin Samoylov, Elizabeth Nunge, Molly Stevens, Patrick Larvie, CHI 2011 Extended Abstracts.
The paper reports on our experience with a basecamp research hub to coordinate logistics and ongoing real-time analysis with research teams in the field. We also reflect on the implications for the meaning of research in a corporate context, where much of the value may be less in a final report, but more in the curated impressions and memories our colleagues take away from the the research trip.

User-Defined Motion Gestures for Mobile Interaction”, Jaime Ruiz, Yang Li, Edward Lank, CHI 2011: ACM Conference on Human Factors in Computing Systems, pp. 197-206.
Modern smartphones contain sophisticated sensors that can detect rich motion gestures — deliberate movements of the device by end-users to invoke commands. However, little is known about best-practices in motion gesture design for the mobile computing paradigm. We systematically studied the design space of motion gestures via a guessability study that elicits end-user motion gestures to invoke commands on a smartphone device. The study revealed consensus among our participants on parameters of movement and on mappings of motion gestures onto commands, by which we developed a taxonomy for motion gestures and compiled an end-user inspired motion gesture set. The work lays the foundation of motion gesture design—a new dimension for mobile interaction.

Information Retrieval

Reputation Systems for Open Collaboration”, B.T. Adler, L. de Alfaro, A. Kulshrestra, I. Pye, Communications of the ACM, vol. 54 No. 8 (2011), pp. 81-87.
This paper describes content based reputation algorithms, that rely on automated content analysis to derive user and content reputation, and their applications for Wikipedia and google Maps. The Wikipedia reputation system WikiTrust relies on a chronological analysis of user contributions to articles, metering positive or negative increments of reputation whenever new contributions are made. The Google Maps system Crowdsensus compares the information provided by users on map business listings and computes both a likely reconstruction of the correct listing and a reputation value for each user. Algorithmic-based user incentives ensure the trustworthiness of evaluations of Wikipedia entries and Google Maps business information.

Machine Learning and Data Mining

Domain adaptation in regression”, Corinna CortesMehryar MohriProceedings of The 22nd International Conference on Algorithmic Learning Theory, ALT 2011.
Domain adaptation is one of the most important and challenging problems in machine learning. 
 This paper presents a series of theoretical guarantees for domain adaptation in regression, gives an adaptation algorithm based on that theory that can be cast as a semi-definite programming problem, derives an efficient solution for that problem by using results from smooth optimization, shows that the solution can scale to relatively large data sets, and reports extensive empirical results demonstrating the benefits of this new adaptation algorithm.

On the necessity of irrelevant variables”, David P. Helmbold, Philip M. LongICML, 2011
Relevant variables sometimes do much more good than irrelevant variables do harm, so that it is possible to learn a very accurate classifier using predominantly irrelevant variables. 
 We show that this holds given an assumption that formalizes the intuitive idea that the variables are non-redundant.  For problems like this it can be advantageous to add many additional variables, even if only a small fraction of them are relevant.

Online Learning in the Manifold of Low-Rank Matrices”, Gal Chechik, Daphna Weinshall, Uri Shalit, Neural Information Processing Systems (NIPS 23), 2011, pp. 2128-2136.
Learning measures of similarity from examples of similar and dissimilar pairs is a problem that is hard to scale. LORETA uses retractions, an operator from matrix optimization, to learn low-rank similarity matrices efficiently. This allows to learn similarities between objects like images or texts when represented using many more features than possible before.

Machine Translation

Training a Parser for Machine Translation Reordering”, Jason Katz-Brown, Slav PetrovRyan McDonaldFranz Och, David Talbot, Hiroshi Ichikawa, Masakazu Seno, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP '11).
Machine translation systems often need to understand the syntactic structure of a sentence to translate it correctly. Traditionally, syntactic parsers are evaluated as standalone systems against reference data created by linguists. Instead, we show how to train a parser to optimize reordering accuracy in a machine translation system, resulting in measurable improvements in translation quality over a more traditionally trained parser.

Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation”, Ashish Venugopal,Jakob Uszkoreit, David Talbot, Franz Och, Juri Ganitkevitch, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP).
We propose a general method to watermark and probabilistically identify the structured results of machine learning algorithms with an application in statistical machine translation. Our approach does not rely on controlling or even knowing the inputs to the algorithm and provides probabilistic guarantees on the ability to identify collections of results from one’s own algorithm, while being robust to limited editing operations.

Inducing Sentence Structure from Parallel Corpora for Reordering”, John DeNeroJakob UszkoreitProceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Automatically discovering the full range of linguistic rules that govern the correct use of language is an appealing goal, but extremely challenging. 
 Our paper describes a targeted method for discovering only those aspects of linguistic syntax necessary to explain how two different languages differ in their word ordering.  By focusing on word order, we demonstrate an effective and practical application of unsupervised grammar induction that improves a Japanese to English machine translation system.

Multimedia and Computer Vision

Kernelized Structural SVM Learning for Supervised Object Segmentation”, Luca BertelliTianli Yu, Diem Vu, Burak Gokturk,Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2011.
The paper proposes a principled way for computers to learn how to segment the foreground from the background of an image given a set of training examples. The technology is build upon a specially designed nonlinear segmentation kernel under the recently proposed structured SVM learning framework.

Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths”, Matthias GrundmannVivek Kwatra, Irfan Essa,IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011).
Casually shot videos captured by handheld or mobile cameras suffer from significant amount of shake. Existing in-camera stabilization methods dampen high-frequency jitter but do not suppress low-frequency movements and bounces, such as those observed in videos captured by a walking person. On the other hand, most professionally shot videos usually consist of carefully designed camera configurations, using specialized equipment such as tripods or camera dollies, and employ ease-in and ease-out for transitions. Our stabilization technique automatically converts casual shaky footage into more pleasant and professional looking videos by mimicking these cinematographic principles. The original, shaky camera path is divided into a set of segments, each approximated by either constant, linear or parabolic motion, using an algorithm based on robust L1 optimization. The stabilizer has been part of the YouTube Editor (youtube.com/editor) since March 2011.

The Power of Comparative Reasoning”, Jay Yagnik, Dennis Strelow, David Ross, Ruei-Sung Lin, International Conference on Computer Vision (2011).
The paper describes a theory derived vector space transform that converts vectors into sparse binary vectors such that Euclidean space operations on the sparse binary vectors imply rank space operations in the original vector space. The transform a) does not need any data-driven supervised/unsupervised learning b) can be computed from polynomial expansions of the input space in linear time (in the degree of the polynomial) and c) can be implemented in 10-lines of code. We show competitive results on similarity search and sparse coding (for classification) tasks.

NLP

Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections”, Dipanjan Das, Slav PetrovProceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL '11), 2011, Best Paper Award.
We would like to have natural language processing systems for all languages, but obtaining labeled data for all languages and tasks is unrealistic and expensive. We present an approach which leverages existing resources in one language (for example English) to induce part-of-speech taggers for languages without any labeled training data. We use graph-based label propagation for cross-lingual knowledge transfer and use the projected labels as features in a hidden Markov model trained with the Expectation Maximization algorithm.

Networks

TCP Fast Open”, Sivasankar Radhakrishnan, Yuchung ChengJerry ChuArvind Jain, Barath Raghavan, Proceedings of the 7th International Conference on emerging Networking EXperiments and Technologies (CoNEXT), 2011.
TCP Fast Open enables data exchange during TCP’s initial handshake. It decreases application network latency by one full round-trip time, a significant speedup for today's short Web transfers. Our experiments on popular websites show that Fast Open reduces the whole-page load time over 10% on average, and in some cases up to 40%.

Proportional Rate Reduction for TCP”, Nandita Dukkipati, Matt Mathis, Yuchung Cheng, Monia Ghobadi, Proceedings of the 11th ACM SIGCOMM Conference on Internet Measurement 2011, Berlin, Germany - November 2-4, 2011.
Packet losses increase latency of Web transfers and negatively impact user experience. Proportional rate reduction (PRR) is designed to recover from losses quickly, smoothly and accurately by pacing out retransmissions across received ACKs during TCP’s fast recovery. Experiments on Google Web and YouTube servers in U.S. and India demonstrate that PRR reduces the TCP latency of connections experiencing losses by 3-10% depending on response size.

Security and Privacy

Automated Analysis of Security-Critical JavaScript APIs”, Ankur Taly, Úlfar Erlingsson, John C. Mitchell, Mark S. Miller, Jasvir Nagra, IEEE Symposium on Security & Privacy (SP), 2011.
As software is increasingly written in high-level, type-safe languages, attackers have fewer means to subvert system fundamentals, and attacks are more likely to exploit errors and vulnerabilities in application-level logic. 
 This paper describes a generic, practical defense against such attacks, which can protect critical application resources even when those resources are partially exposed to attackers via software interfaces. In the context of carefully-crafted fragments of JavaScript, the paper applies formal methods and semantics to prove that these defenses can provide complete, non-circumventable mediation of resource access; the paper also shows how an implementation of the techniques can establish the properties of widely-used software, and find previously-unknown bugs.

App Isolation: Get the Security of Multiple Browsers with Just One”, Eric Y. Chen, Jason Bau, Charles Reis, Adam Barth, Collin Jackson, 18th ACM Conference on Computer and Communications Security, 2011.
We find that anecdotal advice to use a separate web browser for sites like your bank is indeed effective at defeating most cross-origin web attacks. 
 We also prove that a single web browser can provide the same key properties, for sites that fit within the compatibility constraints.

Speech

Improving the speed of neural networks on CPUs”, Vincent VanhouckeAndrew Senior, Mark Z. Mao, Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011.
As deep neural networks become state-of-the-art in real-time machine learning applications such as speech recognition, computational complexity is fast becoming a limiting factor in their adoption. We show how to best leverage modern CPU architectures to significantly speed-up their inference.

Bayesian Language Model Interpolation for Mobile Speech Input”, Cyril AllauzenMichael RileyInterspeech 2011.
Voice recognition on the Android platform must contend with many possible target domains - e.g. search, maps, SMS. For each of these, a domain-specific language model was built by linearly interpolating several n-gram LMs from a common set of Google corpora. The current work has found a way to efficiently compute a single n-gram language model with accuracy very close to the domain-specific LMs but with considerably less complexity at recognition time.

Statistics

Large-Scale Parallel Statistical Forecasting Computations in R”, Murray Stokely, Farzan Rohani, Eric Tassone, JSM Proceedings, Section on Physical and Engineering Sciences, 2011.
This paper describes the implementation of a framework for utilizing distributed computational infrastructure from within the R interactive statistical computing environment, with applications to timeseries forecasting. This system is widely used by the statistical analyst community at Google for data analysis on very large data sets.

Structured Data

Dremel: Interactive Analysis of Web-Scale Datasets”, Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Communications of the ACM, vol. 54 (2011), pp. 114-123.
Dremel is a scalable, interactive ad-hoc query system. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. Besides continued growth internally to Google, Dremel now also backs an increasing number of external customers including BigQuery and UIs such as AdExchange front-end.

Representative Skylines using Threshold-based Preference Distributions”, Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Richard J. Lipton, Jim Xu, International Conference on Data Engineering (ICDE), 2011.
The paper adopts principled approach towards representative skylines and formalizes the problem of displaying k tuples such that the probability that a random user clicks on one of them is maximized. This requires mathematically modeling (a) the likelihood with which a user is interested in a tuple, as well as (b) how one negotiates the lack of knowledge of an explicit set of users. This work presents theoretical and experimental results showing that the suggested algorithm significantly outperforms previously suggested approaches.

Hyper-local, directions-based ranking of places”, Petros Venetis, Hector Gonzalez, Alon Y. Halevy, Christian S. Jensen,PVLDB, vol. 4(5) (2011), pp. 290-30.
Click through information is one of the strongest signals we have for ranking web pages. We propose an equivalent signal for raking real world places: The number of times that people ask for precise directions to the address of the place. We show that this signal is competitive in quality with human reviews while being much cheaper to collect, we also show that the signal can be incorporated efficiently into a location search system.

Systems

Power Management of Online Data-Intensive Services”, David Meisner, Christopher M. Sadler, Luiz André BarrosoWolf-Dietrich Weber, Thomas F. Wenisch, Proceedings of the 38th ACM International Symposium on Computer Architecture, 2011.
Compute and data intensive Web services (such as Search) are a notoriously hard target for energy savings techniques. This article characterizes the statistical hardware activity behavior of servers running Web search and discusses the potential opportunities of existing and proposed energy savings techniques.

The Impact of Memory Subsystem Resource Sharing on Datacenter Applications”, Lingjia Tang, Jason Mars, Neil Vachharajani, Robert Hundt, Mary-Lou Soffa, ISCA, 2011.
In this work, the authors expose key characteristics of an emerging class of Google-style workloads and show how to enhance system software to take advantage of these characteristics to improve efficiency in data centers. The authors find that across datacenter applications, there is both a sizable benefit and a potential degradation from improperly sharing micro-architectural resources on a single machine (such as on-chip caches and bandwidth to memory). The impact of co-locating threads from multiple applications with diverse memory behavior changes the optimal mapping of thread to cores for each application. By employing an adaptive thread-to-core mapper, the authors improved the performance of the datacenter applications by up to 22% over status quo thread-to-core mapping, achieving performance within 3% of optimal.

Language-Independent Sandboxing of Just-In-Time Compilation and Self-Modifying Code”, Jason Ansel, Petr Marchenko, Úlfar Erlingsson, Elijah Taylor, Brad Chen, Derek Schuff, David Sehr, Cliff L. Biffle, Bennet S. Yee, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2011.
Since its introduction in the early 90's, Software Fault Isolation, or SFI, has been a static code technique, commonly perceived as incompatible with dynamic libraries, runtime code generation, and other dynamic code. 
 This paper describes how to address this limitation and explains how the SFI techniques in Google Native Client were extended to support modern language implementations based on just-in-time code generation and runtime instrumentation. This work is already deployed in Google Chrome, benefitting millions of users, and was developed over a summer collaboration with three Ph.D. interns; it exemplifies how Research at Google is focused on rapidly bringing significant benefits to our users through groundbreaking technology and real-world products.

Thialfi: A Client Notification Service for Internet-Scale Applications”, Atul Adya, Gregory Cooper, Daniel MyersMichael Piatek,Proc. 23rd ACM Symposium on Operating Systems Principles (SOSP), 2011, pp. 129-142.
This paper describes a notification service that scales to hundreds of millions of users, provides sub-second latency in the common case, and guarantees delivery even in the presence of a wide variety of failures. 
 The service has been deployed in several popular Google applications including Chrome, Google Plus, and Contacts.


翻译进行中.
 

posted @ 2012-03-24 11:39 石建 | Fat Mind 阅读(1068) | 评论 (0)编辑 收藏

今天开始尝试clojure,遇到的问题、经验整理

1.了解clojure
http://metaphy.iteye.com/blog/458872

2.开始HelloWrold
    - 搭建开发环境(对于从Java过来的人,肯定习惯eclipse)
      在线安装的速度比乌龟还慢,推荐全手动方式安装插件
    (eclipse手动安装插件 http://www.blogjava.net/shijian/archive/2012/03/18/372141.html
      离线zip: http://roysong.iteye.com/blog/1260147
    - 跑起来
        - 先'黑窗口'吧 http://clojure.org/getting_started,热热身
        - eclipse开发(提醒:必须把clojure-xxx.jar加入classpath)
        - 阅读 http://www.ibm.com/developerworks/cn/opensource/os-eclipse-clojure/,再练习

3.如何学习
   http://weiyongqing.iteye.com/blog/1441743
    引 “我就应该一步一步来,先把clojure的doc文档网站上的core都敲打一遍,然后,学习孙宁的RPC框架,空闲时做4clojure的问题”


posted @ 2012-03-18 23:33 石建 | Fat Mind 阅读(1140) | 评论 (0)编辑 收藏

一、快捷键

1.常用快捷键
    a. crtl + h 查找内容
    b. ctrl + shift + r 快速打开资源文件
    c. ctrl + shift + t 快速打开类文件
    d. alt + shift + o  快速打开 '选中相同词,出现阴影'

2.如何设置自己特定的快捷键
    


二、插件

务必阅读:
    http://wiki.eclipse.org/FAQ_How_do_I_install_new_plug-ins%3F (为什么推荐使用eclipse update manager)
    http://www.venukb.com/2006/08/20/install-eclipse-plugins-the-easy-way/ (主要讲解'manual install'安装 方式)

1.插件安装方式
    1.1 在线安装
          官网wiki写的很清楚,优势:1.插件之间依赖管理、版本兼容性管理  2.如同你在Windows安装软件一样,当你不需要的时候可以通过update manage很容易的卸载;当你安装更多的plguin时,更容易管理。
    eclipse wiki对manual install的看法:This obviously is a more dangerous approach, as no certification takes place about the suitability of the plug-in; it may rely on other plug-ins not available in your installation. In the case of compatibility conflicts, you won’t find out until you use the plug-in that it might break.
        可惜的是,很多时候网络的情况不是很理想,尝试很多遍后,依然失败;这是促使manual install根本的原因。  
    1.2 手动安装
        a、第一种方式:下载plugin到本地,解压后复制features、plugin到%eclipse_home%下对应的目录
        如此图 http://static.flickr.com/75/219742315_9ee663e2c8_o.png
        优势:绝对简单;缺点:正好是通过update manager安装的优点,插件之间的依赖、版本兼容性,以及后续的管理,都需要手动操作。
        b、第二种方式:通过.link的方式,解决'后续管理问题'
             b-1、eclipse目录创建 links 目录
             b-2、创建对应的.link文件,如:subversive.link
             b-3、创建subversive/eclipse/,拷贝features、plugin到此目录
             b-4、修改subversive.link文件,如:path=E:/dev/eclipse-t/thrid-plugins/subversive
             b-5、重启eclipse(重启后,发现要使用svn,必须安装subversive connector;验证手动安装的缺点)
         c、提示:
                   - 手动安装插件时,务必仔细阅读,此插件的先前条件(否则出问题,很难排查)。
                    如:m2eclipse先决条件
subeclipse
、mylyn。
                    或 “Pre-requisite: an Eclipse version including Java Support (e.g. with the JDT : Java Development Tools, as in Eclipse For Java Developers, Eclipse For RCP/RAP developers, Eclipse for JavaEE developers, etc.)” http://code.google.com/p/counterclockwise/wiki/Documentation#Install_Counterclockwise_plugin
                   - 
eclipse 手动安装plugin,.link文件的path路径 必须使用绝对路径


总结:对eclipse插件安装,首先推荐update manager;仅当网络环境不允许时,安装失败时,再尝试手动安装。

2.插件资源收集

2.1、 m2eclipse插件安装
    1)先决条件
        a、eclipse3.2或更高版本(可忽略,一般使用的eclipse已经3.5以上 版本)
        b、jdk高于1.4版本;eclipse运行在jdk环境,非jre环境
        c、必须先安装插件:subeclipse(svn)、mylyn(任务管理); mylyn在eclipse3.5以上版本,已默认存在,无需安装
        svn插件在线安装地址(网络不确定性,更推荐下载zip,archive选择本地文件安装)
        http://subclipse.tigris.org/servlets/ProjectProcess;jsessionid=290480ED68C2C7E781DCCE66CE657FC2?pageID=p4wYuA
     2)安装m2eclipse,未找到可下载到本地的zip,只能在线安装,地址 http://www.eclipse.org/m2e/download/


posted @ 2012-03-18 21:10 石建 | Fat Mind 阅读(745) | 评论 (0)编辑 收藏

题记:单元测试的过程中,遇到泛型mock的问题;重新温习一遍,阅读(core java 泛型)



xmind格式(可下载) :整理过程中,记录为xmind格式

单元测试遇到的问题,简化后如下:

 1     public List<? extends Date> getDateT() {
 2         return null;
 3     }
 4     public List<Date> getDate() {
 5         return null;
 6     }
 7     public void mockGetDate() {
 8         TestMain main = mock(TestMain.class);
 9         when(main.getDate()).thenReturn(new ArrayList<Date>()); //编译OK
10         /*
11          * The method thenReturn(List<capture#2-of ? extends Date>) in the type 
12          * OngoingStubbing<List<capture#2-of ? extends Date>>
                 is not applicable for the arguments (ArrayList<Date>)
13          */
14         when(main.getDateT()).thenReturn(new ArrayList<Date>()); //编译错误
15         when(main.getDateT()).thenReturn(new ArrayList<Timestamp>()); //编译错误
16         when(main.getDateT()).thenReturn(new ArrayList<Object>()); //编译错误
17         when(main.getDateT()).thenReturn(new ArrayList()); //编译OK
18     }

仍没理解,哪位大仙,能帮我解释下 ?
posted @ 2012-03-08 21:05 石建 | Fat Mind 阅读(323) | 评论 (0)编辑 收藏

1.应用 jar 冲突
    log4j冲突导致,应用报错。类型转换冲突。
    需求:定位某个类实际从那个jar加载 ? -verbose:class 参数(或者 
-XX:+TraceClassLoading),详细的记录了加载了那些类、从那个jar加载。

参见:http://agapple.iteye.com/blog/946603

2.性能测试过程
   linux有什么命令、或软件,可以同时收集cpu、load、上下文切换、mem、网络IO、磁盘IO等数据吗 ?
   vmstat 含义详解 ? ->  图形化报表 (痛苦的是要'人工'看着记录数据,这简直是程序员的污点呀)
   (vmstat的IO统计的是块设备(如磁盘)的数据,网卡没有对应的设备文件(http://oss.org.cn/kernel-book/ch11/11.2.3.htm),网络IO统计使用iftop) 
   vmstat http://linux.about.com/library/cmd/blcmdl8_vmstat.htm

3.Jboss启动错误 
java.sql.SQLException: Table already exists: JMS_MESSAGES in statement [CREATE CACHED TABLE JMS_MESSAGES]
参见:http://dinghaoliang.blog.163.com/blog/static/126540714201082764733272/
%jboss_home%/server/default/deploy/hsqldb-ds.xml这个文件中有一个DefaultDS数据源配置,临时解决删除hsqldb-ds.xml文件。原因未知。

4.logback 0.9.19 版本,引入<encoder>,放弃 <appender><layout></appenader>
        <encoder>
            
<pattern>%m%n</pattern>
            
<charset class="java.nio.charset.Charset">UTF-8</charset>
        
</encoder>

源码:OutputStreamAppender.java
  protected void writeOut(E event) throws IOException {
    this.encoder.doEncode(event);
  }
对日志文件charset指定,经过debug调试,必须通过此方式配置才有效。否则取系统默认编码。

5.设置linux系统编码
http://linux.vbird.org/linux_basic/0320bash.php#variable_locale
其实‘系统编码’设置,即设置对应的系统变量,则所有可设置系统变量的文件都可设置编码,export使其生效
locale 查看当前用户使用的编码(),locale -a 查看机器所支持的所有编码
默认设置:
  a、系统级别  /etc/profile -> /etc/sysconfig/i18n,设置 LANG (无效显示export生效)(YY:i18n有个LANGUAGE设定,不知其含义,删除无影响)
  b、用户级别 ~/bash_rc、~/bash_profile、~/bash_login、~/profile,读取有限顺序:从左向右;必须显示export生效
  
設定 LANG 或者是 LC_ALL 時,則其他的語系變數就會被這兩個變數所取代。总之一句话:在当前用户设置LANG,是最优方案。

posted @ 2011-12-15 15:55 石建 | Fat Mind 阅读(705) | 评论 (0)编辑 收藏

现象与思路 :

1.业务变化快,导致需要不断沟通 ?
    a、开始1、2功能一起,后来拆分开,先上1;再到1功能,各团队不一致上线  (答:难解,要更有产品意识,帮助产品进行分析)
2.开发:
    a、原代码不优雅,总有想重构的冲动 (答:想是白想,一定要有结果)
    b、相互等待,接口定义不明确,如:接口jar无法提供,注释不明确  (答:时间点&白纸黑字,先沟通再找主管)
    c、联调准备不充分,如:hsf 因为IP发布失败,主站页面配置被修改 (答:前提明确要求,不能有逻辑问题)
3.上线
    a、涉及面太广,信息沟通丢失,如:只是暂停审核,演绎成 '停止所有crm操作'  (答:一定要直接周知到所有相关人)
    b、风险点评估的不全面,未全盘考虑,如:simbacall->bp参数对象不一致,是否会导致失败未联调 (答:不要遗漏或轻视任何风险点)


2011.11.18 晚 19:30
posted @ 2011-11-18 19:39 石建 | Fat Mind 阅读(166) | 评论 (0)编辑 收藏

Annotation

 

题记:建议关于spring问题,请记得查看spring reference

 

一、annotation前生后世

Annotations do not directly affect program semantics, but they do affect the way programs are treated by tools and libraries, which can in turn affect the semantics of the running program. Annotations can be read from source files, class files, or reflectively at run time.

译:annotation不会直接影响程序的语义,xxxAnnotation可以从源文件、class文件、通过反射在运行时读取。

 

参考:http://www.developer.com/print.php/3556176

1.       定义

Annotation type declarations are similar to normal interface declarations. An at-sign (@) precedes the interface keyword. Each method declaration defines an element of the annotation type. Method declarations must not have any parameters or a throws clause. Return types are restricted to primitives, StringClassenums, annotations, and arrays of the preceding types. Methods can have default values.

Annotation声明与普通的interface非常相似,在关键字interface前加@。每一个方法的声明定义一个annotation的元素。方法不能有任何的参数或throws异常。返回类型被限制为:原始类型、StringClassenumannotation、前面描述的类型组成的数组。method定义允许有默认值。

 

2.       Java annotation

There are two types of annotations available with JDK5:

1) Simple annotations: These are the basic types supplied with Tiger, which you can use to annotate your code only; you cannot use those to create a custom annotation type.

三个基本的annotation,如:OverrideDeprecatedSuppresswarnings,不能使用它去定义新的annotation

2) Meta annotations: These are the annotation types designed for annotating annotation-type declarations. Simply speaking, these are called the annotations-of-annotations.

annotation,定义annotation的类型。如:TargetRetentionDocumentedInherited

A.      Target:声明annotation注解的目标类型。如@Target(ElementType.TYPE)@Target(ElementType.METHOD)

B.      Retention:声明annotation被保留的长度。如:RetentionPolicy.SOURCERetentionPolicy.CLASSRetentionPolicy.RUNTIME

C.      Documented:声明被注解的target生成doc是否需要显示annotation信息。

D.      Inheritedxxx

 

3.       annotation作用

a. 不影响程序原本语义的情况下,增加信息+工具=声明式编程。如:spring aop

b. 编译检查

 

 

二、利用annotation实现aop

1. 思路

A.自定义注解定义规则(何时执行)

B.标记行为(执行什么)

C.通过反射生成代理,在对象执行任何方法时,进行拦截判断是否符合规则;若符合,执行对应的行为。

2. 例子

场景:一个表演家,表演节目后,观众拍手鼓掌

原始:表演家拥有所有观众的引用,在自己表演完后,通知观众鼓掌

问题:表演家应该关注表演自身的事情,有哪些观众、观众是否鼓掌,不是其所关注的

改进:AOP方式,表演家仅关注表演,观众鼓掌由其它人负责

总结:

面向切面编程 (AOP) 提供从另一个角度来考虑程序结构以完善面向对象编程(OOP)。 面向对象将应用程序分解成各个层次的对象,而AOP将程序分解成各个切面或者说关注点。这使得可以模块化诸如事务管理等这些横切多个对象的关注点。

 

三、spring aop如何使用annotation

1. 基本方式,通过ProxyFactoryBean

a.通知

b.切入点

b.通知+切入点 à 切面

c.实际对象+接口+切面 à 接口代理()

2. 自动代理方式:

基本方式的问题是,xml配置文件繁琐。

1)基于spring上下文的通知者Bean的自动代理(利用PostBeanProcessor ?)

BeanNameAutoProxyCreator,有属性:beanNames(被代理对象)、interceptorNames(通知)。自动代理beanNames指定的bean,此时interceptorNames指定通知,但interceptor需要实现特定的接口,如:MethodBeforeAdvice

2) 基于aspectJ注解驱动

使用aspectJ风格的注解。原理:生成代理,仅方法级别。使用AspectJ 提供的一个库来做切点(pointcut)解析和匹配。

4.       <aop:config>,优点:任何类都能转化为切面,不需要特殊接口或注解,原始pojo对象

5.       aop参数传递,见:spring reference 6.3.3.6 通知参数

a. around切点,必须传递ProceedingJoinPoint参数,通过ProceedingJoinPoint取得方法的所有信息。

b. 其它切点,利用JoinPoint取得方法的所有参数。

c. 通过参数名绑定传递参数,未理解

 

 

四、annotation相关技术

1. cglib

它的原理就是用Enhancer生成一个原有类的子类,并且设置好callbackproxy 则原有类的每个方法调用都会转为调用实现了MethodInterceptor接口的proxyintercept() 函数。

2.  asm

ASM is an all purpose Java bytecode manipulation and analysis framework. It can be used to modify existing classes or dynamically generate classes, directly in binary form. Provided common transformations and analysis algorithms allow to easily assemble custom complex transformations and code analysis tools.

ASM offer similar functionality as other bytecode frameworks, but it is focused on simplicity of use and performance. 

简单理解asm是比cglib更高级的code generate lib

 

五、others问题

1. Generics 

Generics - This long-awaited enhancement to the type system allows a type or method to operate on objects of various types while providing compile-time type safety. It adds compile-time type safety to the Collections Framework and eliminates the drudgery of casting. See theGenerics Tutorial. (JSR 14)

泛型提供了编译时类型检查安全。这点很重要吗

 

2. 为什么spring aop aspectJ anonotationpointcut声明,必须注解在方法上 ?通过方法唯一标识一个pointcut

猜测:@PointcutTarget属性指定仅为method。方法签名成为pointcutid

  

 

posted @ 2011-06-25 20:09 石建 | Fat Mind 阅读(326) | 评论 (0)编辑 收藏
参照:http://www.quora.com/What-are-some-interesting-and-innovative-startups-in-China

2011年的中国互联网还有什新奇的事发生吗 ?

1.团购:美团、拉手等(烧钱阶段,抢用户、流量,用户体验必须提升,如:退款等服务;预测:再有1年时间,团购巨头就会形成)
2.在线租书 : booksfly(有创意,但目前的运营方式很难有大的成长)
3.图片搜索 : taotaosou (有创意和前途,目前:用户体验差、效果差)

4.微薄:新浪、腾讯(蒸蒸日上,新浪占据主导地位)
5.垂直化社交网站:Mtime、果壳(新方向,垂直化的社交会去慢慢蚕食)
6.rss服务:鲜果(无自己特点)

7.网页游戏 : 忍者村、三国杀(赚钱的行业,但游戏生命周期短的特点值得考虑)
8.在线音乐 : 豆瓣电台、酷狗(推崇豆瓣的简约,做到了极致)

9.在线存储 : 微盘、DBBank

10.手机应用:豌豆荚、VivaMe(阅读器\传媒)、街旁(lbs)
11.UC乐园(UC社交平台,尽管UCWeb是入口,有着先天优势)



 PS:以下纯属个人YY,随手记录

1.亲子、育儿教育社区 + 电商,如何做 ?淘宝http://t.cn/blAkG 全是tms静态页面[囧] 专业亲子网站排名 http://t.cn/SwZSXr


posted @ 2011-04-16 13:08 石建 | Fat Mind 阅读(190) | 评论 (0)编辑 收藏

主要参考:构建高性能web站点


一、网卡

网卡使用一个特定的物理层数据链路层标准,例如以太网来实现通讯所需要的电路系统。这为一个完整的网络协议栈提供了基础,使得在同一局域网中的小型计算机组以及通过路由协议连接的广域网,例如IP,都能够进行通讯。

1.       作用:

a)         唯一的mac地址,定位机器(局域网/以太网mac寻址)

b)        数据接收和发送。拥有物理缓存区。

                         i.              接收:接收物理层数据,通过DMA方式访问内存。

                       ii.              发送:接收上层数据,分解为适当大小的数据包发送。

转载:

数据的封装与解封:发送时将上一层交下来的数据加上首部和尾部,成为以太网的帧。接收时将以太网的帧剥去首部和尾部,然后送交上一层。
链路管理:主要是CSMA/CDCarrier Sense Multiple Access with Collision Detection ,带冲突检测的载波监听多路访问)协议的实现。
编码与译码:即曼彻斯特编码与译码。

2. 协议

以太网Ethernet)是一种计算机局域网组网技术。

ARP协议Address Resolution Protocol),或称地址解析协议ARP协议的基本功能就是通过目标设备的IP地址,查询目标设备的MAC地址。

http://zh.wikipedia.org/zh/%E5%9C%B0%E5%9D%80%E8%A7%A3%E6%9E%90%E5%8D%8F%E8%AE%AE

3. 传输速率

网卡速率是指网卡每秒钟接收或发送数据的能力,单位是Mbps(兆位/秒)由于存在多种规范的以太网,所以网卡也存在多种传输速率,以适应它所兼容的以太网。目前网卡在标准以太网中速度为10Mbps,在快速以太网中速度为100Mbps,在千兆以太网中速度为1000Mbps等。

主流的网卡主要有10Mbps网卡、100Mbps以太网卡、10Mbps/100Mbps自适应网卡、1000Mbps千兆以太网卡以及最新出现的万兆网卡五种。对于一般家庭用户选购10M或者10Mbps/100Mbps自适应网卡即可,对于企业用户建议购买100Mbps以太网卡或者1000Mbps千兆以太网卡或者万兆网卡。

以太网卡和交换设备都支持多速率,设备之间通过自动协商设置最佳的连接速度和双工方式。如果协商失败,多速率设备就会探测另一方使用的速率但是默认为半双工方式。10/100以太网端口支持10BASE-T100BASE-TX

2.       特点

a)         全双工

b)        传输速率

c)        总线类型:PCI总线架构日益成为网卡的首选总线

d)        MAC地址

二、数据如何发送

1.       将数据写入用户进程的内存地址空间,其实实际的开发过程只需对运行时变量赋值即可

2.       应用程度调用系统函数,将数据从用户态内存区复制到由内核维护的一段称为内核缓冲区的内存地址空间。

a)         内核缓存区大小有限,要发送的数据以队列的形式进入

b)        每次复制一定的数据大小,这个大小取决于网络数据包的大小以及内核缓存区的承载能力

3.       当数据写入内核缓存区,内核会通知网卡控制器来读取数据,cpu转而处理其它任务

a)         网卡将发送的数据从内核缓存区复制到网卡缓存区

b)        数据的复制始终按照内部总线的宽度复制(如32位总线,每次复制32bit信息)

4.       网卡发送数据到物理线路

a)         需要对数据进行字节到位的转换(即将数据按照位的顺序发出)

b)        网卡内部使用特定的物理装置,来生成可以传播的各种信息,如铜线,网卡会根据位信息“0/1的变化产生不同的电信号;光线,网卡会生成光信号。

三、电磁波速度

       不管是电信号,还是光信号,进入物理介质后,其传输速度仅依赖其传播介质,铜线中电信号的传输速度大约2.3*108m/s,光纤中光信号的传播速度大约是2.0*108m/s。光在真空中的传播速度是3.0*108m/s,为什么光纤中的传播速度要慢呢 ?因为光在光纤中的传播利用全反射原理,所以传播距离要大于光纤长度。

       由此看见,不同的传播介质中信号的传播速度几乎是常量。也就是说,不论数据发送装置以多快的发送速度让数据以信号的形式进入路线,在线路中信号的传播速度几乎可以认为是一样快的。

       光纤与铜线相比?光纤采用全反射原理,因此光信号衰减底,因此传播距离远。

四、带宽概念

       从上面分析来看,数据的传输包括:发送端发送数据进入线路 + 线路传输,线路传输的速度在各种传输介质几乎是相同的。

       带宽定义:每秒传播bit数,bit/s

       这样看,影响带宽的因素仅为“发送端发送数据进入线路”,如何提升:a、提升发送速度 b、数据传输的并行度

1.       发送速度

数据发送装置将二进制信号传送至线路的能力。关键是,如果接收能力跟不上,发送能力不可能提高。原理:接收速度决定发送速度。

       也就是“流控机制”,保证接收方能够接收数据,不会丢失数据。如Tcp滑动窗口(滑动窗口协议的基本原理,任意时刻发送方、接收方都保持一个连续的允许发送、接收的帧的序号http://blog.csdn.net/yujun00/archive/2006/03/23/636495.aspx)。

2.    并行度,等价于计算机总线的概念。比如:32位,任意同一时刻能传输32位数据。

总结:显然,网卡影响性能结果。

 

posted @ 2011-04-04 00:17 石建 | Fat Mind 阅读(339) | 评论 (0)编辑 收藏

题记:在淘宝广告技术部工作快1年,写点自己对广告的认识

目前在淘宝主要存在这样几种形式广告 CPTCPCCPS

1.CPT
cost per time
按时长计费。大部分属于品牌广告,主要着重于品牌形象的宣传。比如:淘宝中屏滚动广告,如“dell、九牧王。特点:a、价格非常贵 b、位置少。
CPM
Cost Per ThousandImpression)按千次展现次数计费。能够为广告主带来稳定的广告展现,但效果是未知数,具体要看投放的媒体,以及场景的相关性。能为有流量的站点带来稳定的收入。

2.CPC
cost per click
按点击计费。如:google ads、淘宝直通车,都是根据词的竞价排名,决定展示那个广告主的广告。
目前淘宝大部分收入来自于直通车。
如:在淘宝,搜诺基亚 N73”,首先做搜索词归一化,匹配为与竞价词相关的词,再根据竞价词去搜索。谁出价高,并根据用户的信誉度等因素,决定出谁的广告。
站在长远的发展,淘宝直通车的目的:1.增加淘宝收入 2.促进成交。如果仅仅是点击最大化,增加淘宝的收入,但并没有为卖家带来成交,则卖家的出价必然会降低,对于双方来说都是双输的局面。所以必须站在促进成交的前提下,通过直通车不断提升淘宝自身的收入增加。

3.CPS
cost per Sales
按成交计费。据说:某淘宝广告去日本时在一家书店看到的营销模式,回到淘宝后,决定做"淘宝客"

cps模式更加关注的是“长尾流量”,因为长尾publisher,流量质量一般来说不是很高,如果按照前面说到的3种方式计费,对广告主来说是不公平的。但根据成交,分成给长尾publisher,既能保证广告主的利益,也满足广告主营销、推广的目的。同时也能够给publisher带来收入。


小思考:

    目前淘宝有
8亿商品,仅淘宝自身的广告位置,完全不能满足卖家推广、营销的目的。所以淘宝会去买外部广告位(如优酷),或者与外部网站以“分成”的模式合作(google)。

    其实,淘宝也在建立自己的广告联盟,类似于“圈地运动”,去累积足够多的流量、渠道,到时怎么玩,都是由自己决定。

其实目前的淘宝广告外投,对于客户来说是不透明的,卖家不知道自己的广告是否被外投,与淘宝站内的广告相比,站外的广告效果还是要差很多(会基于成交效果给予广告主打折),准确的说“淘宝的广告主是被外投”。

    其实,目前淘宝联盟,对于广告主与网站主自主选择仅有CPT计费模式,当广告位无广告时淘宝联盟会自动推送CPS模式广告。CPC模式广告,在淘宝联盟目前的网站主管理模式下,无法推广CPC广告。原因是,目前网站主没有明确的层级结构,对于CPC广告,流量好坏决定最后的成交率,相比CPT模式CPC更加赚钱,大量流量差的投放CPC广告,会导致cpc(单次点击消耗)降低,肯定对淘宝广告的收入会有非常大的影响。淘宝如果要开放CPC模式给外部网站主,必须有明确的层级结构、站内&站外竞价区分。

    外投每天能为淘宝带来大约300万收入,但对于主动获取广告的publisher来说没有智能匹配模式、没有灵活的定制化界面、多样的创意。当站内CPC趋于饱和时,如何开拓好外投,对于淘宝直通车则是必然的趋势。而CPS自然去占据长尾流量,CPC如何占据主外部中型网站,则是至关重要的。


posted @ 2011-03-20 20:01 石建 | Fat Mind 阅读(492) | 评论 (0)编辑 收藏
     摘要:  一、Java数据类型   数据类型就是对内存位置的抽象表达 (很多编程语言都依赖于特定的计算机类型和对数据类型属性的具体编译实现,比如word和integer数据类型的大小等;Java通过JVM保证数据所占存储空间的大小不会随硬件的改变发生变化)。 1.   Primitive data type :A primiti...  阅读全文
posted @ 2010-12-18 16:52 石建 | Fat Mind 阅读(278) | 评论 (0)编辑 收藏
题记:很长都没有学到这个时间啦,怀念大三。

一、摘要

1.       什么是“代理”

2.       代理模式与适配器模式、装饰者模式的区别,适用场景

3.       手工代理

4.       动态代理的原理

 

二、什么是“代理”

       如:一个CEO,会有一个助理,任何需要CEO处理的事情,都会经过助理过滤、整理后交给CEO。助理就是CEO的代理。

       自己理解,代理就是为帮实际的执行者,做数据的过滤和控制,为实际执行者屏蔽掉外部其它因素的影响,专心去做应该做的事情。

 

三、代理模式与适配器模式、装饰者模式的区别,适用场景

1、代理模式

HeadFirst 定义:为另一个对象提供一个替身或占位符以控制对这个对象的访问。


如上图,代理模式的结构。

适用的场景,如:远程访问、访问权限控制、日志记录等。

装饰者模式,IO类图结构如下:



可以从OutputStream à FileOutputStream à BufferedOutputStream,功能依次增强,为对象增加更多的行为。

自己理解:目的不一样,代理是为控制对被代理对象的访问;装饰者,是对被装饰者功能的增强,避免过度使用继承实现不同的功能。

 

适配器模式,其区别从类图即可分辨出来,如下



Client请求ExecuteClass,但ExecuteClass暴露的接口不符合client的要求,在双方系统都不修改的情况下,利用适配器模式解决此问题。

三、手工代理



场景:根据id,获取Item;代理检查用户的权限是否有权限查看Item,已经记录log日志。具体代码很容易实现。

四、动态代理

对上面的场景,如果使用动态代理,步骤:

1. 根据interface,通过loader,生成Class对象

Class clazz = Proxy.getProxyClass(ItemService.class.getClassLoader(), ItemService.class);

2. 通过反射,获取Class对象的Construct对象(注意:Construct对象需要的参数类型)

Constructor c = clazz.getConstructor(InvocationHandler.class);

3. 调用Construct对象 newInstance()生成实例对象

proxy = (ItemService)c.newInstance(this); //thisInvocationHandler实例

 

思考问题:实现原理是什么



 

对于上面场景,实际动态生成的代理的类图。对代理的任何调用都会,super.handle.invoke(),用户实现InvocationHandler,覆写invoke方法,实现基于方法的控制。

从类图,也解释了为什么只能实现“接口”的动态代理,因为代理本身需要继承Proxy,如果实现“类”的代理,意味着要同时继承两个类,与Java不支持多继承相违背。

附代码是从网上摘抄过来的,代理的源码:

import java.lang.reflect.InvocationHandler;
import java.lang.reflect.Method;
import java.lang.reflect.Proxy;
import java.lang.reflect.UndeclaredThrowableException;
public final class $Proxy0 extends Proxy implements Manager {
private static Method m1;
private static Method m0;
private static Method m3;
private static Method m2;

static {
   
try {
    m1 = Class.forName("java.lang.Object").getMethod("equals",
      
new Class[] { Class.forName("java.lang.Object") });
    m0 = Class.forName("java.lang.Object").getMethod("hashCode",
      
new Class[0]);
    m3 = Class.forName("com.ml.test.Manager").getMethod("modify",
      
new Class[0]);
    m2 = Class.forName("java.lang.Object").getMethod("toString",
      
new Class[0]);
   } catch (NoSuchMethodException nosuchmethodexception) {
    
throw new NoSuchMethodError(nosuchmethodexception.getMessage());
   } catch (ClassNotFoundException classnotfoundexception) {
    
throw new NoClassDefFoundError(classnotfoundexception.getMessage());
   }
}

public $Proxy0(InvocationHandler invocationhandler) {
   
super(invocationhandler);
}
@Override
public final boolean equals(Object obj) {
   
try {
    
return ((Boolean) super.h.invoke(this, m1, new Object[] { obj }))
      .booleanValue();
   } catch (Throwable throwable) {
    
throw new UndeclaredThrowableException(throwable);
   }
}
@Override
public final int hashCode() {
   
try {
    
return ((Integer) super.h.invoke(this, m0, null)).intValue();
   } catch (Throwable throwable) {
    
throw new UndeclaredThrowableException(throwable);
   }
}
}


posted @ 2010-12-12 02:24 石建 | Fat Mind 阅读(224) | 评论 (0)编辑 收藏

题记:一直对ThreadLocal疑惑,听完facebook大牛演讲后,总结点东西。

一、ThreadLocal的作用,整体结构

二、源代码简单分析
  1.set方法
  2.get方法

三、使用场景实例 ibatis SqlMapClientImp

后记:折腾半天,文章的样式也调整不好,打包上传。但愿能帮到别人。
http://www.blogjava.net/Files/shijian/ThreadLocal.rar [请用“web版式视图”阅读]


遗留问题:

      1.ThreadThreadLocalMap threadLocals 属性什么时候实例化 线程实例化时吗
答:第一次set时,会判断是否为null,若为null,初始化。

2.
ThreadLocalMap replaceStaleEntry(key, value, i); 做了什么
答:全清洗stale对象;存放当前对象在发现的第一个stale位置。因为Entry是继承WeakRerfence,任何一次的垃圾收集,都会导致其引用的对象被回收。

4.与Map方式的一些区别
Map策略:a、相同hash&key,覆盖value b、相同hashkey不同,当前元素做为单向链的第一个元素,原来第一个元素做为当前元素的下一个。
ThreadLocalMap策略:a相同,是不存在b情况;以ThreadLocal作为keyThreadLocalthreadLocalHashCode由原子AtomicInteger计算getAndAdd(0x61c88647)得到;在Entry[]数组的位置,通过threadLocalHashCode & (length-1)计算;对于b情况,继续查找Entry[]数组的下一个位置,是否可存放(key相同或null);当size>=threshold(len*2/3)做resize=oldLen*2. 


3.
ThreadLocalMap  getEntryAfterMiss(ThreadLocal key, int i, Entry e)
答:作用,查找没有存放在hash计算出index位置的元素。为什么出现此情况?见4,由ThreadLocalMap策略决定.

posted @ 2010-12-11 18:50 石建 | Fat Mind 阅读(317) | 评论 (0)编辑 收藏
  对自己、对环境题记:这样的环境,我是很不满意的,但是“有想法的人是不会被环境所束缚的,尽管不满的环境会带给他阻碍”

   2010年的最好一个月开始啦,很快这个月就会结束的,就像转眼间来淘宝已经半年啦!之前,兴奋的选择广告,觉得这是一个新鲜的从未接触的事物,环境的确是这样的;但自己团队负责的业务,更多的是“重复的沟通和劳动”,技术在这样的环境被淡化。其实根本的原因,在于"团队的职责与自己的发展方向不匹配",这是让自己很痛苦的事情。带给自己的是煎熬!幸好有老婆可以说说,和安慰我;不然,或许我已经离开(当然现在也给自己的是一个期限,不会一直在这样的环境待下去)。在2010年结束的时候,希望自己可以高兴的对自己说,我成长啦!在这最后的一个月,让我冲刺一下吧!

  想要的是:Jvm理知识补充、读书(围绕Jvm)、冷静(我想要冷静去平复内心的煎熬,用自己的行动去改善,去满意的度过每天)

  后记:买个篮球,明天早上去打球,肚子越来越大啦,真的不能再胖啦



    随手翻翻,10年写的一点感想。前天师兄有天在讨论,对一个公司、个人,什么到底是"技术" ?一致同意解决问题的能力,才是最关键。问题是"解决问题的能力,如何培养呢?"一致同意"最有效的方式,解决有挑战性的问题"。问题是"日常工作中,挑战性的问题,并不是随时都有?"老毛说的好,"有条件上,没有条件也要上",去深入理解团队、系统定位,去规划、去解决不合理的代码、不要重复劳动 ...
   同时一个人的软技能也是至关重要,擎天柱的培训还是很有收获,一定要尝试去运用学到的理论知识到实际的环境中,如跨团队配合。
   现在的心态是:先全心全意做好事情,再回头看,
对自己、对环境有不满意的,去改变,改变不了则学会放弃。

posted @ 2010-12-03 23:28 石建 | Fat Mind 阅读(234) | 评论 (0)编辑 收藏
题记:周五同学们讨论这个问题,听了很多"过来人"的感受,其实里面的道理还是要自己去慢慢揣摩,成为自己的东西,此会用帮助。

回想自己的学习:

自己最开始接触计算机的时候,是高中;现在还清楚的记得,第一节课,老师告诉怎么开机、关机等,自己心里全是兴奋和一些自卑。真正去学习计算机,还是大学专业的缘故(遗憾的是基础知识不牢);第一次接触编程是C,大一过年的时候,在家一个寒假都在看,感觉“这是一本天书”。
刚开始,主要是看书和老师上课讲,学的比较少;然后,自己去找资料,用baidu和论坛(csdn),尝试去写;慢慢有点自己的想法,去学习课外的知识(很感谢电驴,很多资料都来自于它),google和javaeye,看别人的博客;写过一点小东西之后,尝试着去理解背后的原理,去debug看源码,学习新知识时首先看官方tutorial&运行小例子,关注行业动态和新技术(reader订阅),会有意识去总结理解的内容,做事更加严谨和职业点 ... 这基本也是自己现在的状态。

讨论时,记下认为对自己有帮助的点:

1.学会使用,了解原理,与同类产品比较,提升认知的高度 
注:了解原理(理解的深度不够);与同类产品比较(尚未有这个意识)(重要性:高)

2.点到面,融合,形成自己的知识体系
注:这点,自己慢慢的有点意识(重要性:高

3.信息爆炸,学会筛选;看过的东西要记笔记
注:筛选(做的很不好,必须有选择性的看,工具reader);记笔记(部分阅读有笔记,认为重要的必须写下自己的理解)(重要性:高)

4.学习“相关性”知识
注:比如工作负责广告前段应用开发,有意识的去了解引擎和算法的相关内容(不要求细节,但是对整体必须有理解) (重要性:高)

5.帮助别人、分享,其实是提升自己的一个好办法
注:在精力允许的情况下,一定要善于帮助别人解决问题(原因:自己遇到的问题始终是有限的,同时增强影响力)

6.Java开发应该扩展的知识
注:a、数学(算法)相关   b、如何自己去实现一个框架,必须有质疑的态度   c、运行环境(linux、jboss等知识)

7.阅读别人代码,有新认识的时候去重构自己的代码或尝试去应用
注:

8.理论的总结
注:在广度、深度达到一定程度时,要注意理论的总结,站在更高的抽象层面去理解和解决问题


提醒自己:this is most important  is to do it 




posted @ 2010-11-20 23:34 石建 | Fat Mind 阅读(174) | 评论 (0)编辑 收藏

题记:一个cookie,整个下午都没有找到解决的办法。

 

一、遇到的问题

1. 情景:访问http://list.mall.daily.taobao.net/50024400/xxx,当前页面通过ajax请求广告,请求的域为http://tmatch.simba.taobao.com/xxx;广告引擎向页面种seesion范围的cookie_back,用于标识翻页;

      2. 问题:

点击当前页面的翻页,IE下广告不翻页?通过firebughttpwatch对比,发现IEcookie“_back”不正确。开始猜测是引擎种cookie的逻辑存在问题,但很多地方都在使用此接口,均没有问题。

且有人的机器翻页正常,此时怀疑是浏览器设置问题?再用httpwatch观察“http请求头,发现_back没有回传给引擎(其实httpwatchcookies也可以观察到,如果发送cookie的话,会显示为Sent;之前只观察到Received)? 确认是浏览器的问题。

3. 解决:打开IE隐私设计,通常默认设置为,拒绝没有隐私政策的第三方cookie ...”,意味着_back并没有成功写入客户端,所有请求引擎导致不能正确回传_back,翻页失败。

         这么说淘宝所有的广告的翻页都是不可用的 ?肯定不是。问题在第一方 Cookie 来自您正浏览的网站,它们可以是永久的或临时的;第三方 Cookie 来自您正浏览的网站上的其他网站的广告”,对于浏览器“taobao.nettaobao.com”就是不同的两个网站,所以引擎的_back是无法种在客户端。此情景是daliy环境,线上的环境访问的是list.mall.daily.taobao.com,所以不存在第三方cookie”的概念,广告是可以正确显示。

 

二、关于cookie小知识

 

1.IE Cookie的格式

第一行名称,第二行,第三行所属域” ...比如“.taobao.com”存在cna,此cookie会被浏览器自动发送到任何属于此域的子域;www.taobao.com\taobao.com,后面的是根域,前一个是二级域。xp存放目录为:C:\Documents and Settings\<username>\Cookies\,文件命名:你的用户名@生成COOKIEdomain[COOKIE改变的次数].txt

  参考:http://blog.csdn.net/zhangxinrun/archive/2010/07/31/5779574.aspx

 

2.Js Cookie跨域访

 http://blog.csdn.net/tongdoudpj/archive/2009/05/10/4166096.aspx

 

3.cookiesession的关系

根本的原因:http协议的无状态性,cookie的出现就是为了解决这个问题。

session是一种在客户端与服务器之间保持状态的解决方案。服务端存储内容,返回对应的key给客户端,当下次访问时,带上此key,实现状态的维持。

session实现:

1.依赖cookieThe session cookie is stored in temporary memory and is not retained after the browser is closed。(实际测试:IE8,未在1描述的位置找到session级别cookie对应的文件,猜测‘临时存储在浏览器内存’,当关闭浏览器时则丢失key)

2.url重写。Servlet规范定义此功能。当浏览器禁用cookie时,就算session级别的内容也不会被存储。resp.encodeRedirectURL(url),且仅当禁用cookie时有效,重写结果如:http://www.demo.com/cookie.do;jsessionid=19gfy1sg740dl1whwd72lbqlhb

疑问:server如何判断,是否需要重写呢?从实验现象看,判断是否收到name=JSESSIONID cookie,若无,则进行url重写。

           最好的方式,翻翻tomcatjetty的源码实现,但未找到对应的代码。

关于cookie的详细信息参见: http://en.wikipedia.org/wiki/HTTP_cookie

 

posted @ 2010-11-08 21:41 石建 | Fat Mind 阅读(678) | 评论 (0)编辑 收藏
题记:老大开始在团队推行敏捷。记录目前自己理解的优点,还有敏捷不适应问题。

一、自己理解的敏捷

1.风险分散。这点,我是非常肯定的。亲身的体会,jim负责A模块,以前做法:项目经理pety,在A模块快提交的前期去和jim沟通模块的完成情况。现在做法:每天jim向prty汇报自己的进度情况和问题。帮助pety对项目的可控性提高很多,风险也能尽早的暴露出来。带来的问题:必须对A模块进行更细的任务分解,如何分解?时间如何评估?谁来评估?(团队做法:由jim自己细分A模块,再与pety确认计划,此时pety可以给出自己的意见,共同来评估时间。)

2.可视化。同意这点。之前做法:每天发邮件周知大家进度情况;团队目前做法:看板(贴每个人具体细分的任务),每天汇报具体的进度和遇到的问题。其实两种做法的目的是一样,都是想让别人知道自己的进度和问题。但第二种方法优点:a.阅读邮件,是每个人独立的行为,是分散的,看板上任务的汇报是大家在一起的,此时面对面的沟通是更高效的;b.对于我的感觉,看板比邮件更加可视化;c.基于看板,后期的后顾和总结也更加方便。

3.团队化。jim负责A模块,jeny负责B模块,当A模块jim可能存在跟多问题(前期没有评估到),希望jeny可以帮jim完成其中的部分。问题:jeny根本不熟悉A模块,熟悉A模块所花费的时间,以及对B模块的影响,都是需要评估的。自己的理解:对于多人同时开发的项目,此方法还是可以试用的;但是对于单兵作战的日常,大家的精力都是有限的,很难说A做大一半的时候,让B来帮忙。这也是目前大家争论的焦点。

二、团队目前的做法

1.细粒度的任务分解。比如:搜索页面智能导航,分解的任务:a.了解接口需要的参数,以及返回的结果的格式 b.请求参数处理  c.返回xml结果解析 d.后期根据业务逻辑的处理。整个任务肢解的力度非常细,对项目风险的把控更加有好处。
2.看板。分为:任务、开发中、done(三部分)。根据每天大家反馈的情况,更新项目剩余需要的工时(细粒度到时间)。
3.晨会。早晨站在看板前,每个对着看板,说自己“昨天干了什么&是否遇到什么问题、今天准备做什么”,对应的调整看板的内容和任务所需时间。
4.总结。这点目前做的不够好,不是大家都来提建议,可能整个团队还是“一个大脑”,只是一个人在想问题(当然这与团队的氛围是有关系的)。

三、难点

1.个人的积极性和参与度
  敏捷是需要团队的每个人都以主人公的态度参与进来的,当然团队也要能够给予他认同,这是软实力的问题;比如:相互提建议,但是这首先需要团队安全的环境,对老大说的话,大家可以提出不同的意见,否则始终是“一个脑袋”在思考,大家习惯于去服从;同时也要克服养成的“中庸之道”,当然也要注意提建议的方式。
2.任务的细分和工时的评估
  需要项目经理对团队成员有很熟悉的了解,才能合理的安排任务。根据不同的人确定不同的工时,大家都能在一种被尊重和快乐的氛围中工作,此时的效率肯定是高。
3.团队的成就感
  需要自下而上的,获得的成就是每一个人的,而不是简单是他的或者我的。每个人都能找到被认同的感觉,乐于分享自己的收获,此时整个团队的每个人都会成长,团队的战斗力肯定也会大大提高。


总结:发现难点的地方,还是团队软实力的建设。
posted @ 2010-11-07 18:47 石建 | Fat Mind 阅读(178) | 评论 (0)编辑 收藏
题记:主要记录同学分享的关于数据库设计方面的内容,思考过一点,记录下来。

一、从需求开始,考虑数据库的设计,且结合具体数据库特性

  一个论坛,要求显示帖子的总条数,对于mysql、innodb引擎;上线前期完全没有问题,当人气积累,帖子达到千万级别时,此时性能的问题就会显现出来。好的设计,是需求阶段就要考虑的。对于我,这是一个思想概念上的转变。
  但引擎如果换成myisam,就算数据达到亿的级别,“统计总数”这样的问题还会存在吗 ?不会,myisam自身就会维护总数信息。因此设计,必须结合具体的数据库的特性来做,不能以一概全。

二、应该遵循原则 (未验证)

  1.小结果集驱动大结果集。理由:mysql的连接查询原理 ?
  2.尽可能在索引中完成排序。理由:索引本身就是有序的
  3.只取自己需要的column ?
  4.避免复杂的连接查询和子查询。
  5.适当的数据冗余.理由:帖子表&用户表,如果帖子表拥有username,则每次帖子的显示是不需要连接查询获取username。 
  6.应用层的cahce机制 ?

三、概念

  1.垂直拆分:按列进行分割,即把一条记录分开多个地方保存,每个子表的行数相同。帖子表,id、userid、username、content、xxx ...前面4个字段很常用,但是后面xxx等很多字段,不常用,数据量很大。进行垂直拆分,table1字段包含“id、userid、username、content”,table2包含“xxx、...”等不常用字段。优点:减少io的操作。缺点:如果需要不常用字段信息,需要连表查询。
  2.水平拆分:
按记录进分分割,不同的记录分开保存,每个子表的列数相同。比如:淘宝的用户交易数据,根据用户id取模,确定具体的数据存放在那个数据库。水平拆分会给应用带来复杂性。
  3.集群:提高系统的可用性。分库的节点引入多台机器,每台机器保 存的数据是一样,负载均衡在多台机器。如何均衡、探测机器的可用性,是新的问题 ?
  4.主备:一般的互联网应用中,经过一些数据调查得出结论,读/写的比例大概在 10:1左右。为什么要读写分离:写操作涉及到锁的问题,不管是行锁还是表锁还是块锁,在大并发的情况下,效率很低。写操作集中在一个节点上,而读操作其其他 的N个节点上进行。读写分离引入的新问题:比如我的Master上的数据怎样和集群中其它Slave机器保持数据的同步和一致呢?




posted @ 2010-11-07 17:53 石建 | Fat Mind 阅读(456) | 评论 (0)编辑 收藏
请参考:http://en.wikipedia.org/wiki/Join_(SQL)#Sample_tables

inner JOINS

  An inner join is the most common join operation used in applications and can be regarded as the default join-type. Inner join creates a new result table by combining column values of two tables (A and B) based upon the join-predicate. The query compares each row of A with each row of B to find all pairs of rows which satisfy the join-predicate. When the join-predicate is satisfied, column values for each matched pair of rows of A and B are combined into a result row. The result of the join can be defined as the outcome of first taking the Cartesian product (or cross-join) of all records in the tables (combining every record in table A with every record in table B)—then return all records which satisfy the join predicate. Actual SQL implementations normally use other approaches like a Hash join or a Sort-merge join where possible, since computing the Cartesian product is very inefficient.

  注意:innner查询(默认的连接查询方式),是先查询“Cartesian”生成中间表,再根据where条件筛选结果;但此方法非常低效,SQL具体的实现可能是 
Hash join or a Sort-merge join 。
        
One can further classify inner joins as equi-joins, as natural joins, or as cross-joins.

SELECT *
FROM employee INNER JOIN department
ON employee.DepartmentID = department.DepartmentID;
The following example shows a query which is equivalent to the one from the previous example.

SELECT *
FROM   employee, department
WHERE  employee.DepartmentID = department.DepartmentID;

Outer joins

  An outer join does not require each record in the two joined tables to have a matching record. The joined table retains each record—even if no other matching record exists. Outer joins subdivide further into left outer joins, right outer joins, and full outer joins, depending on which table(s) one retains the rows from (left, right, or both).

Example of a left outer join, with the additional result row italicized:

SELECT *
FROM   employee  LEFT OUTER JOIN department
ON employee.DepartmentID = department.DepartmentID;
Employee.LastNameEmployee.DepartmentIDDepartment.DepartmentNameDepartment.DepartmentID
Jones 33 Engineering 33
Rafferty 31 Sales 31
Robinson 34 Clerical 34
Smith 34 Clerical 34
John NULL NULL NULL
Steinberg 33 Engineering 33


Example right outer join, with the additional result row italicized:

SELECT *
FROM   employee RIGHT OUTER JOIN department
ON employee.DepartmentID = department.DepartmentID;
Employee.LastNameEmployee.DepartmentIDDepartment.DepartmentNameDepartment.DepartmentID
Smith 34 Clerical 34
Jones 33 Engineering 33
Robinson 34 Clerical 34
Steinberg 33 Engineering 33
Rafferty 31 Sales 31
NULL NULL Marketing 35


Example full outer join: (mysql is not support)

SELECT *
FROM   employee
FULL OUTER JOIN department
ON employee.DepartmentID = department.DepartmentID;
Employee.LastNameEmployee.DepartmentIDDepartment.DepartmentNameDepartment.DepartmentID
Smith 34 Clerical 34
Jones 33 Engineering 33
Robinson 34 Clerical 34
John NULL NULL NULL
Steinberg 33 Engineering 33
Rafferty 31 Sales 31
NULL NULL Marketing 35


Self-join

A query to find all pairings of two employees in the same country is desired.

An example solution query could be as follows:

SELECT F.EmployeeID, F.LastName, S.EmployeeID, S.LastName, F.Country
FROM Employee F, Employee S
WHERE F.Country = S.Country
AND F.EmployeeID < S.EmployeeID
ORDER BY F.EmployeeID, S.EmployeeID;

Which results in the following table being generated.

Employee Table after Self-join by Country
EmployeeIDLastNameEmployeeIDLastNameCountry
123 Rafferty 124 Jones Australia
123 Rafferty 145 Steinberg Australia
124 Jones 145 Steinberg Australia
305 Smith 306 John Germany










Join algorithms

Three fundamental algorithms exist for performing a join operation: Nested loop joinSort-merge join and Hash join.




 

posted @ 2010-11-03 15:36 石建 | Fat Mind 阅读(278) | 评论 (0)编辑 收藏
题记:新同学分享了“测试驱动”,第一次感觉测试驱动离自己那么近。因此开始尝试。记下自己的一小点想法。

1.一切从测试开始
  不管在写复杂的代码,还是简单的代码,一切从测试开始。练习成自己编码的习惯。
  自己的做法只能算是“伪测试驱动”,因为还是有详细的设计,但遵循此做法,对于需求的变更、代码存在的bug,导致编码修改的时候,心里是踏实的。
  目前自己还不能感觉到对后期的维护会带来什么样的结果?
  推荐《测试驱动开始》。

2.工具
  junit、mockito、emma

  junit,这是大家熟知的,学到新点:a。参数化测试  b。private方法测试(反射) c。runwith & Unite,组织测试单元
  mockito,轻量的mock工具。测试中很麻烦的一个问题是:环境依赖,比如:web中依赖容器生成request对象。mockito很好的解决大部分问题(static类与private方法未能解决)。
  emma,代码覆盖率检查工具,eclipse插件。效果:红色=未测试;黄色=测试不完整,只是部分逻辑;绿色=测试完整。(注意:不能绝对的追求覆盖率,一定要记住2/8原则,将主要的精力关注主要的逻辑)。

3.习惯
  a。代码结构,3部分:prepare(包含mock)准备数据、action执行、assert验证
  b。方法命名:被测试方法名$测试目的,如:run$ParameterIsNull
  c。测试A类,有两个方法run()和prepare(),run方法调用prepare,且prepare执行非常耗时间。想要单独测试run()方法 ?
     答案:B extends A,复写prepare方法(等于是mock prepare方法),单独的测试run方法逻辑。
  d。持续写测试代码的习惯



  

posted @ 2010-11-02 21:31 石建 | Fat Mind 阅读(233) | 评论 (0)编辑 收藏

导航

<2010年11月>
31123456
78910111213
14151617181920
21222324252627
2829301234
567891011

统计

常用链接

留言簿

随笔分类

随笔档案

搜索

最新评论

What 、How、Why,从细节中寻找不断的成长点