Change Dir

先知cd——热爱生活是一切艺术的开始

统计

留言簿(18)

积分与排名

“牛”们的博客

各个公司技术

我的链接

淘宝技术

阅读排行榜

评论排行榜

如何高效的实现一个计数器map

这本是多年前一个stackoverflow上的一个讨论,回答中涉及到了多种计数方法。对于一个key-value结构的map,我们在编程时会经常涉及到key是对象,而value是一个integer或long来负责计数,从而统计多个key的频率。

面对这样一个基本需求,可能有很多种实现。比如最基本的使用jdk的map直接实现——value是一个integer或者long。其基本代码型如下:

   1: final Map<String, Integer> freq = new HashMap<String, Integer>();
   2: int count = freq.containsKey(word) ? freq.get(word) : 0;
   3: freq.put(word, count + 1);

逻辑简单,判断是否存在,是则get取值,否则为0,再put进去一个加1后的值。总共要contain判断,get,put做三次方法调用。

当然进一步我们可以把contain判断去掉,代码如下:

   1: final Map<String, Integer> freq = new HashMap<String, Integer>();
   2: Integer count = freq.get(word);
   3: if (count == null) {
   4:     freq.put(word, 1);
   5: } else {
   6:     freq.put(word, count + 1);
   7: }

一般情况,我们做到这个地步,多数人对其逻辑已经满足,简单性能也能接受,试着想一下,难道不是这样吗?get加put,解决了。

当然这样的实现还不够高效,于是我们开始去尝试实现或寻找更高效的方法,看看开源的集合类库是否有需要的:

有个Trove,可以让我们参考一下:

   1: final TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>();
   2: freq.adjustOrPutValue(word, 1, 1);

这样做,非常优雅啊,性能如何呢?不知道,需要看源码了解细节。那再看看大名鼎鼎的guava如何呢?

   1: AtomicLongMap<String> map = AtomicLongMap.create();
   2: map.getAndIncrement(word);

实现依然优雅,但是,但是看这名字,再看源码,好吧,线程安全的,支持并发,这不好搞了,我们场景需要吗?不需要的话直觉告诉我们这肯定是“慢”的。再找:

   1: Multiset<String> bag = HashMultiset.create();
   2: bag.add(word);

这个看上去合适了,bag的实现明显好很多,而且从语义理解上,这样的接口更容易让人理解。

那么这些方法,性能如何呢?做了个简单的比较,将26个英文字母做key,均匀循环若干次比较各个方法的效率(单纯时间效率),而且时间不统计构建开销。外加了一个线程安全版的concurrentMap实现,而其实google的guava里的AtomicLongMap也是包装了juc的concurrentMap而已。里面有最终极的MutableInt方法,找找看吧,性能最好的就是它了。

   1: /**
   2:  * 
   3:  */
   4:
   5:  
   6: import gnu.trove.map.hash.TObjectIntHashMap;
   7:  
   8: import java.util.HashMap;
   9: import java.util.Map;
  10: import java.util.concurrent.ConcurrentHashMap;
  11: import java.util.concurrent.ConcurrentMap;
  12: import java.util.concurrent.atomic.AtomicLong;
  13:  
  14: import com.google.common.collect.HashMultiset;
  15: import com.google.common.collect.Multiset;
  16: import com.google.common.util.concurrent.AtomicLongMap;
  17:  
  18: /**
  19:  * @author Administrator
  20:  * 
  21:  */
  22: public class IntMapTest {
  23:  
  24:     /**
  25:      * @param args
  26:      */
  27:     public static void main(String[] args) {
  28:         // TODO Auto-generated method stub
  29:         int cycles[] = { 100, 1000, 10000, 100000 };
  30:         Tester baseLine = new BaseLine();
  31:         Tester testForNull = new UseNullTest();
  32:         Tester useAtomicLong = new UseAtomicLong();
  33:         Tester useTrove = new UseTrove();
  34:         Tester useMutableInt = new UseMutableInt();
  35:         Tester useGuava = new UseGuava();
  36:         Tester useGuava2 = new UseGuava2();
  37:  
  38:         for (int i = 0; i < cycles.length; i++) {
  39:             System.out.println("-----With " + cycles[i] + " cycles-----");
  40:             baseLine.test(cycles[i]);
  41:             testForNull.test(cycles[i]);
  42:             useAtomicLong.test(cycles[i]);
  43:             useTrove.test(cycles[i]);
  44:             useMutableInt.test(cycles[i]);
  45:             useGuava.test(cycles[i]);
  46:             useGuava2.test(cycles[i]);
  47:             System.out.println("------------------------");
  48:         }
  49:  
  50:     }
  51:  
  52: }
  53:  
  54: abstract class Tester {
  55:     long ms;
  56:     static String[] strs = "abcdefghijklmnopqrstuvwxyz".split("");
  57:  
  58:     void pre() {
  59:         System.out.println("====" + this.getName() + "Test Case ");
  60:         ms = System.currentTimeMillis();
  61:         System.out.println("start at " + ms);
  62:     }
  63:  
  64:     void post() {
  65:         ms = System.currentTimeMillis() - ms;
  66:         System.out.println("Time used: " + ms + " ms");
  67:     }
  68:  
  69:     abstract void doAction(int cycles);
  70:  
  71:     public void test(int cycles) {
  72:         pre();
  73:         doAction(cycles);
  74:         post();
  75:     }
  76:  
  77:     abstract String getName();
  78: }
  79:  
  80: class BaseLine extends Tester {
  81:     final Map<String, Integer> freq = new HashMap<String, Integer>();
  82:  
  83:     @Override
  84:     void doAction(int cycles) {
  85:         for (int i = 0; i < cycles; i++) {
  86:             for (String word : strs) {
  87:                 int count = freq.containsKey(word) ? freq.get(word) : 0;
  88:                 freq.put(word, count + 1);
  89:             }
  90:         }
  91:     }
  92:  
  93:     @Override
  94:     String getName() {
  95:         return "BaseLine";
  96:     }
  97:  
  98: }
  99:  
 100: class UseNullTest extends Tester {
 101:     final Map<String, Integer> freq = new HashMap<String, Integer>();
 102:  
 103:     @Override
 104:     void doAction(int cycles) {
 105:         for (int i = 0; i < cycles; i++) {
 106:             for (String word : strs) {
 107:                 Integer count = freq.get(word);
 108:                 if (count == null) {
 109:                     freq.put(word, 1);
 110:                 } else {
 111:                     freq.put(word, count + 1);
 112:                 }
 113:             }
 114:         }
 115:     }
 116:  
 117:     @Override
 118:     String getName() {
 119:         return "TestForNull";
 120:     }
 121:  
 122: }
 123:  
 124: class UseAtomicLong extends Tester {
 125:     final ConcurrentMap<String, AtomicLong> map = new ConcurrentHashMap<String, AtomicLong>();
 126:  
 127:     @Override
 128:     void doAction(int cycles) {
 129:         for (int i = 0; i < cycles; i++) {
 130:             for (String word : strs) {
 131:                 map.putIfAbsent(word, new AtomicLong(0));
 132:                 map.get(word).incrementAndGet();
 133:             }
 134:         }
 135:     }
 136:  
 137:     @Override
 138:     String getName() {
 139:         return "AtomicLong";
 140:     }
 141:  
 142: }
 143:  
 144: class UseTrove extends Tester {
 145:     final TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>();
 146:  
 147:     @Override
 148:     void doAction(int cycles) {
 149:         for (int i = 0; i < cycles; i++) {
 150:             for (String word : strs) {
 151:                 freq.adjustOrPutValue(word, 1, 1);
 152:             }
 153:         }
 154:     }
 155:  
 156:     @Override
 157:     String getName() {
 158:         return "Trove";
 159:     }
 160:  
 161: }
 162:  
 163: class MutableInt {
 164:     int value = 1; // note that we start at 1 since we're counting
 165:  
 166:     public void increment() {
 167:         ++value;
 168:     }
 169:  
 170:     public int get() {
 171:         return value;
 172:     }
 173: }
 174:  
 175: class UseMutableInt extends Tester {
 176:     Map<String, MutableInt> freq = new HashMap<String, MutableInt>();
 177:  
 178:     @Override
 179:     void doAction(int cycles) {
 180:         for (int i = 0; i < cycles; i++) {
 181:             for (String word : strs) {
 182:                 MutableInt count = freq.get(word);
 183:                 if (count == null) {
 184:                     freq.put(word, new MutableInt());
 185:                 } else {
 186:                     count.increment();
 187:                 }
 188:             }
 189:         }
 190:     }
 191:  
 192:     @Override
 193:     String getName() {
 194:         return "MutableInt";
 195:     }
 196:  
 197: }
 198:  
 199: class UseGuava extends Tester {
 200:     AtomicLongMap<String> map = AtomicLongMap.create();
 201:  
 202:     @Override
 203:     void doAction(int cycles) {
 204:         for (int i = 0; i < cycles; i++) {
 205:             for (String word : strs) {
 206:                 map.getAndIncrement(word);
 207:             }
 208:         }
 209:     }
 210:  
 211:     @Override
 212:     String getName() {
 213:         return "Guava AtomicLongMap";
 214:     }
 215:  
 216: }
 217:  
 218: class UseGuava2 extends Tester {
 219:     Multiset<String> bag = HashMultiset.create();
 220:  
 221:     @Override
 222:     void doAction(int cycles) {
 223:         for (int i = 0; i < cycles; i++) {
 224:             for (String word : strs) {
 225:                 bag.add(word);
 226:             }
 227:         }
 228:     }
 229:  
 230:     @Override
 231:     String getName() {
 232:         return "Guava HashMultiSet";
 233:     }
 234:  
 235: }

输出结果如下:

   1: -----With 100 cycles-----
   2: ====BaseLineTest Case 
   3: start at 1358655702729
   4: Time used: 7 ms
   5: ====TestForNullTest Case 
   6: start at 1358655702736
   7: Time used: 3 ms
   8: ====AtomicLongTest Case 
   9: start at 1358655702739
  10: Time used: 14 ms
  11: ====TroveTest Case 
  12: start at 1358655702753
  13: Time used: 2 ms
  14: ====MutableIntTest Case 
  15: start at 1358655702755
  16: Time used: 2 ms
  17: ====Guava AtomicLongMapTest Case 
  18: start at 1358655702757
  19: Time used: 4 ms
  20: ====Guava HashMultiSetTest Case 
  21: start at 1358655702761
  22: Time used: 7 ms
  23: ------------------------
  24: -----With 1000 cycles-----
  25: ====BaseLineTest Case 
  26: start at 1358655702768
  27: Time used: 17 ms
  28: ====TestForNullTest Case 
  29: start at 1358655702785
  30: Time used: 7 ms
  31: ====AtomicLongTest Case 
  32: start at 1358655702792
  33: Time used: 44 ms
  34: ====TroveTest Case 
  35: start at 1358655702836
  36: Time used: 17 ms
  37: ====MutableIntTest Case 
  38: start at 1358655702853
  39: Time used: 5 ms
  40: ====Guava AtomicLongMapTest Case 
  41: start at 1358655702858
  42: Time used: 9 ms
  43: ====Guava HashMultiSetTest Case 
  44: start at 1358655702868
  45: Time used: 50 ms
  46: ------------------------
  47: -----With 10000 cycles-----
  48: ====BaseLineTest Case 
  49: start at 1358655702918
  50: Time used: 16 ms
  51: ====TestForNullTest Case 
  52: start at 1358655702934
  53: Time used: 14 ms
  54: ====AtomicLongTest Case 
  55: start at 1358655702948
  56: Time used: 29 ms
  57: ====TroveTest Case 
  58: start at 1358655702977
  59: Time used: 10 ms
  60: ====MutableIntTest Case 
  61: start at 1358655702988
  62: Time used: 5 ms
  63: ====Guava AtomicLongMapTest Case 
  64: start at 1358655702993
  65: Time used: 15 ms
  66: ====Guava HashMultiSetTest Case 
  67: start at 1358655703009
  68: Time used: 77 ms
  69: ------------------------
  70: -----With 100000 cycles-----
  71: ====BaseLineTest Case 
  72: start at 1358655703086
  73: Time used: 124 ms
  74: ====TestForNullTest Case 
  75: start at 1358655703210
  76: Time used: 118 ms
  77: ====AtomicLongTest Case 
  78: start at 1358655703329
  79: Time used: 240 ms
  80: ====TroveTest Case 
  81: start at 1358655703569
  82: Time used: 102 ms
  83: ====MutableIntTest Case 
  84: start at 1358655703671
  85: Time used: 45 ms
  86: ====Guava AtomicLongMapTest Case 
  87: start at 1358655703716
  88: Time used: 126 ms
  89: ====Guava HashMultiSetTest Case 
  90: start at 1358655703842
  91: Time used: 98 ms
  92: ------------------------

一般结论:单线程使用MutableInt,多线程使用guava的AtomicLongMap,其实可以看看guava对addAndGet的实现,循环,很有趣。

最后总结一下,我们在对这个问题做优化的时候,明显的思路就是减少方法调用,而MutableInt效率最高,明显的是它将方法调用减少到最小——1次get,指针的威力顿时显现。当然实际业务代码实现的时候还要考虑到多个因素,比如代码可读性,与业务结合等等,我们现实中不一定要追求如此的效率,但是也要避免毫无思考的写下baseline里的代码,因为明显是可优化的,why not?

注:文中单个实现代码来自stackoverflow的各个回答,测试代码本人编写。

ref:

本文提到的so的原帖:http://stackoverflow.com/questions/81346/most-efficient-way-to-increment-a-map-value-in-java

trove:http://trove.starlight-systems.com/

guava:http://code.google.com/p/guava-libraries/

posted on 2013-01-20 12:40 changedi 阅读(4635) 评论(0)  编辑  收藏 所属分类: Java技术


只有注册用户登录后才能发表评论。


网站导航:
博客园   IT新闻   Chat2DB   C++博客   博问