这段时间很忙,呵呵,没时间写blog。
前两天看了一个文本比较的算法,算法的思路我就不多说了,主要说下我的实现。算法参考:
文本比较算法剖析(1)-如何确定最大匹配率
文本比较算法剖析(2)-如何确定最优匹配路径
我的实现步骤是:
1、计算出所有可行的路径
如下图中,N(l,r)所在的位置如果该位置匹配,则肯定要走A区。在扫描A区所有的可行路径,可行路径的标准是下一个可行并且匹配的点,在这一步先不考虑不普配的点。
关键代码 :
public List<CharPoint> getNextPoints(boolean[][] comps, CharPoint p) {
if (comps == null || p == null)
throw new IllegalArgumentException("参数不能为空。");
int maxY = 0;
List<CharPoint> cps = new ArrayList<CharPoint>();
for (int i = p.getX() + 1; i < comps.length; i++) {
if (maxY == 0 && p.getX() + 1 == i) {
maxY = comps[i].length - 1;
}
int maxJ = maxY;
boolean bo = false;
for (int j = p.getY() + 1; j <= maxJ; j++) {
if (comps[i][j]) {
if (!bo) {
bo = true;
maxY = j;
}
CharPoint cp = new CharPoint(i, j);
cp.setFromPoint(p);
cps.add(cp);
}
}
}
return cps;
}
2、计算出最大匹配的路径
获得所有可行的路径后再对所有路径的节点数进行判断,节点最多的则是最大匹配路径。但有可能有几个最大匹配路径。
关键代码:
for (CharPoint cp : map.keySet()) {
// System.out.println(map.get(cp));
if (map.get(cp) == max) {
cps.add(cp);
}
}
for (CharPoint cp : cps) {
cp = TextComparerUtil.copyCharPoint(cp);
while (cp.getFromPoint() != null) {
cp.getFromPoint().setNext(cp);
cp = cp.getFromPoint();
}
cps2.add(cp);
}
在计算完最大匹配路径后在对路径进行补充。使路径完整
关键代码:
public static List<CharPoint> applySetpBySetpPath(boolean[][] comps,
List<CharPoint> cps) {
for (CharPoint cp : cps) {
CharPoint next = cp.getNext();
applySetpPath(comps, cp, next);
}
return cps;
}
private static void applySetpPath(boolean[][] comps, CharPoint cp,
CharPoint next) {
while (next != null) {
if (cp.getX() >= 0 && cp.getY() >= 0
&& cp.getX() + 1 < comps.length
&& cp.getY() + 1 < comps[cp.getX()].length
&& comps[cp.getX()][cp.getY()]) {
cp.setNext(new CharPoint(cp.getX() + 1, cp.getY() + 1));
cp = cp.getNext();
}
for (int i = cp.getX() + 1; i <= next.getX(); i++) {
cp.setNext(new CharPoint(i, cp.getY()));
cp = cp.getNext();
}
for (int i = cp.getY() + 1; i <= next.getY(); i++) {
cp.setNext(new CharPoint(cp.getX(), i));
cp = cp.getNext();
}
next = next.getNext();
}
}
public static CharPoint applyToEnd(boolean[][] comps, CharPoint cp) {
while (cp.getNext() != null) {
cp = cp.getNext();
}
CharPoint next = new CharPoint(comps.length - 1, comps[0].length - 1);
applySetpPath(comps, cp, next);
return cp;
}
3、计算最优路径
计算最优路径的方法很简单,按照作者的算法的介绍,计算最优路径的方法就是最后匹配的节点离root(0,0)点最近的那条路径
4、计算差异
计算差异也很简单,就不介绍了,直接上代码
TextDiff diffdel = null;
TextDiff diffins = null;
while (root != null) {
if (root.getNext() != null) {
CharPoint next = root.getNext();
if (root.getX() == next.getX() && root.getY() != next.getY()) {
if (diffdel == null) {
int start = root.getY();
if (comps[root.getX()][root.getY()]) {
start = next.getY();
}
diffdel = new TextDiff(TextDiff.TYPE_DELETED, start, 0);
}
diffdel.setDiffLength(diffdel.getDiffLength() + 1);
}
if (root.getY() == next.getY() && root.getX() != next.getX()) {
if (diffins == null) {
int start = root.getX();
if (comps[root.getX()][root.getY()]) {
start = next.getX();
}
diffins = new TextDiff(TextDiff.TYPE_INSERTED, start, 0);
}
diffins.setDiffLength(diffins.getDiffLength() + 1);
}
if (root.getX() < comps.length
&& root.getY() < comps[root.getX()].length
&& comps[next.getX()][next.getY()]) {
if (diffdel != null && diffdel.getDiffLength() != 0) {
diffs.add(diffdel);
}
if (diffins != null && diffins.getDiffLength() != 0) {
diffs.add(diffins);
}
diffdel = null;
diffins = null;
}
}
root = root.getNext();
if (root != null && root.getNext() == null) {
if (diffdel != null && diffdel.getDiffLength() != 0) {
diffs.add(diffdel);
}
if (diffins != null && diffins.getDiffLength() != 0) {
diffs.add(diffins);
}
}
}
(代码有BUG。。。。暂停下载)
顺便提前祝各位新春快乐,新年吉祥,和家安康
posted on 2009-01-10 14:17
phyeas 阅读(3504)
评论(1) 编辑 收藏