Posted on 2011-09-14 13:46
oathleo 阅读(5571)
评论(0) 编辑 收藏 所属分类:
自己
对于工业生产现场来说,过程实时数据变化很快,数据存储量很大,如果每个数据都存储,在经历不长时间后就会占据大量磁盘空间。一般来说,工业生产的很多数 据是线性变化的,或者是符合某种规律变化的,如果数据库能够根据某些条件进行判断,将某些可忽略的数据,不进行存储,而当需要查找该数据时,可以通过线性 或步进插值计算出来,就可以大幅度提高存储效率,同时节约磁盘空间。
上述描述的情况就是在线数据压缩。所谓数据压缩,就是丢弃那些对于在准确重现现场设备(以下称为测点)历史曲线时不是必需的测点数据。
当今,非常流行的数据压缩算法是由美国OSI软件公司研发的旋转门压缩算法,此算法已经成功地运用在了PI实时数据库系统当中,此算法主要针对的对象是浮点数数据类型的数据。
旋转门压缩算法分析:
With the swinging door algorithm, a value is stored if a straight line drawn between the last stored value and the next value does not come within the compression deviation specification of all the intermediate points. Two slopes are required to carry out this test. The following figure shows the slopes as they are initialized after a value is stored:
Figure1 – Swinging Door slopes after recording a value
The dotted lines are the two slopes. Let the compression deviation specification be 8. One of the lines is drawn from the last recorded value plus 8 through whichever value maximizes the slope of the line. This is the top dotted line in Figure 1. The other dotted line is drawn from the last recorded value minus 8 through whichever value minimizes the slope of the line. The third line is drawn between the last recorded value and the new value. This is the solid line in Figure 1. The previous value is recorded if the slope of the top dotted line is greater than the slope of the solid line or the slope of the solid line is greater than the slope of the bottom dotted line.
The algorithm ensures that each discarded value falls within the compression deviation specification of the solid line. The compression deviation specification is also the maximum error in a trend of archived values. The next figure shows the lines after four more values have been received.
The next figure shows the arrival of a value which causes the previous value to be recorded.
Figure 2 – Recording a new value
对于旋转门压缩算法来说,先由上一保存数据项和当前数据项来画出一条直线(在二维坐标图上),如果待保存数据项不在当前数据项和上一保存数据项的压缩偏差范围之内,则待保存数据项被保存。实验中还需要两条斜线(旋转门)。图1(Figure 1)中显示了这两个旋转门,传入系统的第一个测点数据项会直接被保存,否则因为数据库中没有被保存的测点数据项就无法确定旋转门了。
压缩偏差是旋转门压缩算法中的重要参数,它是人为设定的绝对误差值,可以简单的理解为在绝对误差范围内,数据被压缩掉,在绝对误差范围外,数据不被压缩。
另外,算法的实现还需要计算以下几个斜率:
(1)上斜率 K1 =(当前数据项数值 -(上一保存数据项数值 - 压缩偏差))/(当前数据项时间 - 上一保存数据项时间)
(2)下斜率 K2 =(当前数据项数值 -(上一保存数据项数值 + 压缩偏差))/(当前数据项时间 - 上一保存数据项时间)
(3)中间斜率K =(当前数据项数值 - 待保存数据项数值)/(当前数据项时间 - 待保存数据项时间)
通过计算压缩变量上一保存数据项和当前数据项与待保存数据项的斜率来进行压缩控制。即:
如果 K2≤K≤K1,待保存数据项被压缩。
如果 K<K2或者K>K1,待保存数据项被存储。
算法实现流程如下:
1.第一个数据项处理:直接存入数据库。
2.第二个数据项处理:计算前后两数据项的上下两个斜率,并将上下斜率作为后续判断的依据。
3.两个数据项以上处理:计算上中下斜率,进行判断:(1)如果没有通过旋转门压缩检测,把上一个数据项信息保存,并将新的上下斜率保存作为后续判断的依据;(2)如果通过旋转门压缩检测,则不需要保存。
4.循环执行第三步中的压缩条件判断。
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <math.h>
static int maxnum = 3600;
void main(int argc,char **argv[])
{
int now=0, start=0;
FILE *fd, *fd1;
fd = fopen("test", "r");
fd1 = fopen("test.zip", "w");
float E=10.01;
float mem, mem_old;
float upgate; /*定义上门*/
float downgate; /*定义下门*/
float k1; /*k1表示上门和过程数据间的斜率*/
float k2; /*k2表示下门和过程数据间的斜率*/
fread(&mem, sizeof(float), 1, fd);
mem_old = mem;
for(;;) {
if(now == maxnum-1) {
fwrite(&mem, sizeof(float), 1, fd1);
break;
}
fwrite(&mem, sizeof(float), 1, fd1);
start = now;
upgate=mem+E;
downgate=mem-E;
k1=-10000;
k2=-10000;
for(;;) {
now++;
mem_old = mem;
fread(&mem, sizeof(float), 1, fd);
if(fabs(mem-upgate)>0.001){
if((mem-upgate)/(now -start)>k1) k1=(mem-upgate)/(now-start);
else {
now=now++;
fwrite(&mem_old, sizeof(float), 1, fd1);
break;
}
}
if(fabs(mem-downgate)>0.001){
if((downgate-mem)/(now-start)>k2) k2=(downgate-mem)/(now-start);
else {
now=now++;
fwrite(&mem_old, sizeof(float), 1, fd1);
break;
}
}
if(now == maxnum-1) {
break;
}
}