Decode360's Blog

业精于勤而荒于嬉 QQ:150355677 MSN:decode360@hotmail.com

语源科技BlogJava :: 首页 :: 新随笔 :: 联系 :: :: 管理 ::

397 随笔 :: 33 文章 :: 29 评论 :: 0 Trackbacks

10G中正则表达式的应用

10G中正则表达式的应用

在ITPUB上看到个帖子，计算四则运算的，顺便来学习一下10g里的正则表达式：

原帖地址： http://www.itpub.net/viewthread.php?tid=1051167&extra=page%3D1%26amp%3Bfilter%3Ddigest

把楼主计算四则运算的SQL贴一下，加了一些我自己的注释：

select a.id,

max (text) text,

sum (regexp_substr(add_text, '[0-9]+' , 1 ,n) -- 依次找出第 N 个数字

decode(regexp_substr( '+' ||add_text, '[^0-9]' , 1 ,n), '+' , 1 ,- 1 )) -- 依次找出 +|- ，然后在后面的数字上乘以系数

-- 以上 sum 计算了所有 +|- 运算的总合计值

nvl( sum (( select decode(substr(regexp_substr( '+' ||text, '[+|-]([0-9]+[*|/]+)+[0-9]+' , 1 ,n), 1 , 1 ), '+' , 1 ,- 1 )

-- 找出 +|- 开头，并紧跟数字、 [*|/] 、数字的部分，依次根据第一位来判定系数

power( 10 , Sum ( Log ( 10 ,decode(regexp_substr( '*' ||regexp_substr(text, '([0-9]+[*|/]+)+[0-9]+' , 1 ,n), '[^0-9]' , 1 , rownum ),

-- 找出第 n 个数字、 [*|/] 、数字相连的部分

-- 排除数字，找出前面找到的部分中的第 rownum 个非数字的字符 ( 最前面加 *)

'*' ,

regexp_substr(regexp_substr(text, '([0-9]+[*|/]+)+[0-9]+' , 1 ,n), '[0-9]+' , 1 , rownum ),

-- 如果是 '*' 则，则直接找到 * 后面的数字部分

1 /regexp_substr(regexp_substr(text, '([0-9]+[*|/]+)+[0-9]+' , 1 ,n), '[0-9]+' , 1 , rownum )

-- 如果不是 '*'( 即 /) ，则用 1/NUM

))))

-- 外层通关 LOG 和 POWER 函数，把乘除法转换为加减法

from dual connect by rownum <=len) -- 在这里再做一层循环，用于计算乘除法

) , 0 ) wanted

from

( select a.id,

a.text,

length(regexp_replace(text, '[0-9]+' ))+ 1 len, -- 去掉数字计算运算符个数

regexp_replace(text, '([0-9]+[*|/]+)+[0-9]+' , 0 ) add_text -- 将 *|/ 操作的数均用 0 代替

from t_mar a) a,

( select rownum n from dual connect by rownum < 100 ) b

where a.len>=b.n -- 可以直接形成从 1 到 a.len 的循环操作

group by id

order by id ;

除了一些转化、分类的思想之外，主要就是用到了正则表达式，再把Oracle 10g中的正则表达式规则也贴一下：

\ The backslash character can have four different meanings depending on

the context. It can:

■ Stand for itself

■ Quote the next character

■ Introduce an operator

■ Do nothing

* Matches zero or more occurrences

+ Matches one or more occurrences

? Matches zero or one occurrence

| Alternation operator for specifying alternative matches

^ Matches the beginning of a string by default. In multiline mode, it matches

the beginning of any line anywhere within the source string.

$ Matches the end of a string by default. In multiline mode, it matches the

end of any line anywhere within the source string.

. Matches any character in the supported character set except NULL

[ ] Bracket expression for specifying a matching list that should match any

one of the expressions represented in the list. A nonmatching list

expression begins with a circumflex (^) and specifies a list that matches

any character except for the expressions represented in the list.

( ) Grouping expression, treated as a single subexpression

{m} Matches exactly m times

{m,} Matches at least m times

{m,n} Matches at least m times but no more than n times

\n The backreference expression (n is a digit between 1 and 9) matches the nth

subexpression enclosed between '(' and ')' preceding the \n

[..] Specifies one collation element, and can be a multicharacter element (for

example, [.ch.] in Spanish)

[: :] Specifies character classes (for example, [:alpha:]). It matches any character

within the character class.

[==] Specifies equivalence classes. For example, [=a=] matches all characters

having base letter 'a'.

这样就比较完整了，至于regexp_substr和regexp_replace主要查询《SQL Reference》就可以了。

---------------------------------------------------------------------------

学习一下：

select text,regexp_replace(text, '[0-9]' , 0 ) from t_mar -- 所有数字每个都用 0 代替

select text,regexp_replace(text, '[0-9]+' , 0 ) from t_mar -- 所有数字相邻的用 1 个 0 代替

select text,regexp_replace(text, '[*|/]' , 0 ) from t_mar -- 所有 * 或 / 每个都用 0 代替

select text,regexp_replace(text, '[*|/]+' , 0 ) from t_mar -- 所有 * 或 / 相邻的用 1 个 0 代替

select text,regexp_replace(text, '[0-9]+[*|/]+' , 0 ) from t_mar -- 数字和 * 或数字和 / 相邻的用 1 个 0 代替

select text,regexp_replace(text, '([0-9]+[*|/]+)+[0-9]+' , 0 ) from t_mar -- 数字和 *|/ 和数字相邻的用 1 个 0 代替

select text,regexp_replace(text, '[^0-9]' , 0 ) from t_mar -- 除了数字别的字符每个都用 0 代替

select text,regexp_substr(text, '([0-9]+[*|/]+)+[0-9]+' , 1 , 1 ) from t_mar -- 找到第 1 个数字、 [*|/] 、数字相连的部分

select regexp_substr( '34*66_50#$@(97)5' , '[^0-9]' , 2 , 4 ) from dual -- 找到从第 2 位开始的第 4 个非数字字符

posted on 2008-12-29 21:17 decode360 阅读(276) 评论(0) 编辑收藏所属分类: 05.SQL

新用户注册刷新评论列表


只有注册用户登录后才能发表评论。




网站导航: 博客园博客园最新博文博问管理
相关文章: 一段很绕的SQL(双重否定) SQLPlus中的COPY指令学习关于Views的Updatable问题 Update的另一种写法两段好玩的SQL[AskTom] 关于Break On [SQLPlus] ratio_to_report函数关于MERGE操作的一些建议 SQLPlus { sqlterminator \| escape } Oracle的转义字符

Decode360's Blog

公告

常用链接

留言簿(13)

随笔分类(397)

随笔档案(397)

文章分类(33)

新闻分类(15)

收藏夹(74)

Blog_List

IT_Web

My_Link

最新随笔

最新评论