Posted on 2008-05-16 10:42
发信人: NetMD (C++), 信区: CPlusPlus
标 题: [FAQ] C/C++中的序列点
发信站: 水木社区 (Wed Feb 7 01:13:41 2007), 站内
0. 什么是副作用(side effects)
Accessing a volatile object, modifying an object, modifying a file, or
calling a function that does any of those operations are all side effects,
which are changes in the state of the execution environment.
Accessing an object designated by a volatile lvalue, modifying an object,
calling a library I/O function, or calling a function that does any of
those operations are all side effects, which are changes in the state of
the execution environment.
void foo() {
register int i = 0; // 变量i被直接放入寄存器中,本文中被称为寄存器变量
// 注,register只是一个建议,不一定确实放入寄存器中
// 而且没有register关键字的auto变量也可能放入寄存器
// 这里只是用来示例,假设i确实放入了寄存器中
i = 1; // 寄存器内容改变,对应了程序状态的改变,该语句有副作用
i + 1; // 编译时该语句一般有警告:“warning: expression has no effect”
// CPU如果执行这个语句,也肯定会改变某个寄存器的值,但是程序状态
// 并未改变,除了代表i的寄存器,程序状态不包含其他寄存器的内容,
// 因此该语句没有任何副作用
特别的,C99和C++2003都指出,no effect的expression允许不被执行
An actual implementation need not evaluate part of an expression if it
can deduce that its value is not used and that no needed side effects
are produced (including any caused by calling a function or accessing
a volatile object).
1. 什么是序列点(sequence points)
At certain specified points in the execution sequence called sequence
points, all side effects of previous evaluations shall be complete and
no side effects of subsequent evaluations shall have taken place.
extern int i, j;
i = 0;
j = i;
上面的代码中i = 0以及j = i都是一个完整表达式,;说明了表达式的结束,因此
在;处有一个序列点,按照序列点的定义,要求在i = 0之后j = i之前的那个序列
点上对i = 0的求值以及副作用全部结束(0被写入i中),而j = i的任何副作用都
还没有开始。由于j = i的副作用是把i的值赋给j,而i = 0的副作用是把i赋值为
0,如果i = 0的副作用发生在j = i之后,就会导致赋值后j的值是i的旧值,这显
的状态不能被确定,那么标准规定这样的程序是undefined behavior,稍后会解释
2. 表达式求值(evaluation of expressions)与副作用发生的相互顺序
Except where noted, the order of evaluation of operands of individual
operators and subexpressions of individual expressions, and the order
in which side effects take place, is unspecified.
extern int *p;
extern int i;
*p = i++; // (1)
都是这么实现的,原因在于i++的求值过程同*p = i++是有区别的,对于单独的表
i的地址;对于*p = i++,如果要先完整的计算子表达式i++,由于i++表达式的值
是i的旧值,因此还需要一个额外的寄存器B以及一条额外的指令来辅助*p = i++的
extern int i, j, k, x;
x = (i++) + (j++) + (k++);
编译器可以先计算(i++) + (j++) + (k++)的值,然后再对i、j、k各自加1,最后
3. 序列点对副作用的限制
Between the previous and next sequence point a scalar object shall
have its stored value modified at most once by the evaluation of an
expression. Furthermore, the prior value shall be accessed only to
determine the value to be stored. The requirements of this paragraph
shall be met for each allowable ordering of the subexpressions of a
full expression; otherwise the behavior is undefined.
要求任意可能的执行顺序都必须满足该条件,否则代码将是undefined behavior
extern int i, a[];
extern int foo(int, int);
i = ++i + 1; // 该表达式对i所做的两次修改都需要写回对象,i的最终值取决
// 于到底哪次写回最后发生,如果赋值动作最后写回,则i的值
// 是i的旧值加2,如果++i动作最后写回,则i的值是旧值加1,
// 因此该表达式的行为是undefined
a[i++] = i; // 如果=左边的表达式先求值并且i++的副作用被完成,则右边的
// 值是i的旧值加1,如果i++的副作用最后完成,则右边的值是i
// 的旧值,这也导致了不确定的结果,因此该表达式的行为将是
// undefined
foo(foo(0, i++), i++); // 对于函数调用而言,标准没有规定函数参数的求值
// 顺序,但是标准规定所有参数求值完毕进入函数体
// 执行之前有一个序列点,因此这个表达式有两种执
// 行方式,一种是先求值外层foo调用的i++然后求值
// foo(0, i++),然后进入到foo(0, i++)执行,这之
// 前有个序列点,这种执行方式还是在两个相邻序列
// 点之间修改了i两次,undefined
// 另一种执行方式是先求值foo(0, i++),由于这里
// 有一个序列点,随后的第二个i++求值是在新序列
// 点之后,因此不算是两个相邻的序列点之间修改i
// 两次
// 但是,前面已经指出标准规定任意可能的执行路径
// 都必须满足条件才是定义好的行为,这种代码仍然
// 是undefined
个的副作用用于修改同一个对象,例如示例代码i = ++i + 1;,则程序的结果是依
对于built-in类型是undefined behavior的表达式对于UDT确可能是良好定义的,
i = i++; // 如果i是built-in类型对象,则该表达式在两个相邻的序列点之间对
// i修改了两次,undefined
// 如果i是UDT类型该表达式也许是i.operator=(i.operator++(int)),
// 函数参数求值完毕后会有一个序列点,因此该表达式并没有在两个
// 相邻的序列点之间修改i两次,OK
由此可见,常见的问题如printf("%d, %d", i++, i++)这种写法是错误的,这类问
类似的问题同样发生在cout << i++ << i++这种写法上,如果overload resolution
否则等价于operator<<(operator<<(cout, i++), i++),如果i是built-in类型对
象,这种写法跟foo(foo(0, i++), i++)的问题一致,都是未定义行为,因为存在
是良好定义的,跟i = i++一样,但是这种写法也是不推荐的,因为标准对于函数
4. 编译器的跨序列点优化
A. 不读取,改写一次,例如
i = 0;
B. 读取一次或者多次,改写一次,但所有读取仅仅用于决定改写后的新值,例如
i = i + 1; // 读取一次,改写一次
i = i & (i - 1); // 读取两次,改写一次,感谢puke给出的例子
C. 不改写,读取一次或者多次,例如
j = i & (i - 1);
qualified类型对象多次仍旧是undefined behavior,原因在于该读取动作有副作
extern volatile int i;
if (i != i) { // 探测很短的时间内i是否发生了变化
// ...
如果i != i被优化为只读一次,则结果恒为false,故RoachCock认为编译器不能
int j = i;
if (j != i) { // 将对volatile-qualified类型变量的多次读取用序列点隔开
// ...
bool flag = true;
void foo() {
while (flag) { // (2)
// ...
明为volatile bool,C++2003对volatile的说明如下
[Note: volatile is a hint to the implementation to avoid aggressive
optimization involving the object because the value of the object
might be changed by means undetectable by an implementation. See 1.9
for detailed semantics. In general, the semantics of volatile are
intended to be the same in C++ as they are in C. ]
5. C99定义的序列点列表
— The call to a function, after the arguments have been evaluated.
— The end of the first operand of the following operators:
logical AND && ;
logical OR || ;
conditional ? ;
comma , .
— The end of a full declarator:
— The end of a full expression:
an initializer;
the expression in an expression statement;
the controlling expression of a selection statement (if or switch);
the controlling expression of a while or do statement;
each of the expressions of a for statement;
the expression in a return statement.
— Immediately before a library function returns.
— After the actions associated with each formatted input/output function
conversion specifier.
— Immediately before and immediately after each call to a comparison
function, and also between any call to a comparison function and any
movement of the objects passed as arguments to that call.
6. C++2003定义的序列点列表
们使用函数语义的时候并不提供built-in operators所规定的那几个序列点,而
7. C++2003中两处关于lvalue的修改对序列点的影响
在C语言中,assignment operators的结果是non-lvalue,C++2003则将assignment
义,但是它却导致了很多在合法的C代码在目前的C++中是undefined behavior,例
extern int i;
extern int j;
i = j = 1;
由于(j = 1)的结果是lvalue,该结果作为给i赋值的右操作数,需要一个lvalue-
to-rvalue conversion,这个conversion代表了一个读取语义,因此i = j = 1就
由于C++2003规定assignment operators的结果是lvalue,因此下列在C99中非法的
extern int i;
(i += 1) += 2;
的结果从rvalue修改为lvalue,这甚至导致了下列代码也是undefined behavior
extern int i;
extern int j;
i = ++j;
同样是因为lvalue作为assignment operator的右操作数需要一个左值转换,该转
behavior,因此Andrew Koenig在1999年的时候就向C++标准委员会提交了一个建
议要求为assignment operators增加新的序列点,但是到目前为止C++标准委员会
都还没有就该问题达成一致意见,我将Andrew Koenig的提议附后,如果哪位有时
间有兴趣,可以看看,不过不看也不会有任何损失 :-)
222. Sequence points and lvalue-returning operators
Section: 5 expr Status: drafting Submitter: Andrew Koenig Date: 20 Dec 1999
believe that the committee has neglected to take into account one of
the differences between C and C++ when defining sequence points. As an
example, consider
(a += b) += c;
where a, b, and c
all have type int. I believe that this expression has undefined
behavior, even though it is well-formed. It is not well-formed in C,
because += returns an rvalue there. The reason for the undefined
behavior is that it modifies the value of `a' twice between sequence
Expressions such as this one are sometimes genuinely useful. Of course, we could write this particular example as
a += b; a += c;
but what about
void scale(double* p, int n, double x, double y) {
for (int i = 0; i < n; ++i) {
(p[i] *= x) += y;
of the potential rewrites involve multiply-evaluating p[i] or unobvious
circumlocations like creating references to the array element.
way to deal with this issue would be to include built-in operators in
the rule that puts a sequence point between evaluating a function's
arguments and evaluating the function itself. However, that might be
overkill: I see no reason to require that in
x[i++] = y;
the contents of `i' must be incremented before the assignment.
less stringent alternative might be to say that when a built-in
operator yields an lvalue, the implementation shall not subsequently
change the value of that object as a consequence of that operator.
find it hard to imagine an implementation that does not do this
already. Am I wrong? Is there any implementation out there that does
not `do the right thing' already for (a += b) += c?
5.17 expr.ass paragraph 1 says,
result of the assignment operation is the value stored in the left
operand after the assignment has taken place; the result is an lvalue.
is the normative effect of the words "after the assignment has taken
place"? I think that phrase ought to mean that in addition to whatever
constraints the rules about sequence points might impose on the
implementation, assignment operators on built-in types have the
additional constraint that they must store the left-hand side's new
value before returning a reference to that object as their result.
could argue that as the C++ standard currently stands, the effect of x
= y = 0; is undefined. The reason is that it both fetches and stores
the value of y, and does not fetch the value of y in order to compute
its new value.
I'm suggesting that the phrase "after the
assignment has taken place" should be read as constraining the
implementation to set y to 0 before yielding the value of y as the
result of the subexpression y = 0.
Note that this suggestion is
different from asking that there be a sequence point after evaluation
of an assignment. In particular, I am not suggesting that an order
constraint be imposed on any side effects other than the assignment
Francis Glassborow:
My understanding is that for a single variable:
Multiple read accesses without a write are OK
A single read access followed by a single write (of a value dependant on the read, so that the read MUST happen first) is OK
A write followed by an actual read is undefined behaviour
Multiple writes have undefined behaviour
is the 3) that is often ignored because in practice the compiler hardly
ever codes for the read because it already has that value but in
complicated evaluations with a shortage of registers, that is not
always the case. Without getting too close to the hardware, I think we
both know that a read too close to a write can be problematical on some
So, in x = y = 0;, the implementation must NOT fetch a
value from y, instead it has to "know" what that value will be (easy
because it has just computed that in order to know what it must, at
some time, store in y). From this I deduce that computing the lvalue
(to know where to store) and the rvalue to know what is stored are two
entirely independent actions that can occur in any order commensurate
with the overall requirements that both operands for an operator be
evaluated before the operator is.
Erwin Unruh:
distinguishes between the resulting value of an assignment and putting
the value in store. So in C a compiler might implement the statement
x=y=0; either as x=0;y=0; or as y=0;x=0; In C the statement (x += 5) +=
7; is not allowed because the first += yields an rvalue which is not
allowed as left operand to +=. So in C an assignment is not a sequence
of write/read because the result is not really "read".
In C++ we
decided to make the result of assignment an lvalue. In this case we do
not have the option to specify the "value" of the result. That is just
the variable itself (or its address in a different view). So in C++,
strictly speaking, the statement x=y=0; must be implemented as y=0;x=y;
which makes a big difference if y is declared volatile.
I think undefined behaviour should not be the result of a single
mentioning of a variable within an expression. So the statement (x +=5)
+= 7; should NOT have undefined behaviour.
In my view the semantics could be:
the result of an assignment is used as an rvalue, its value is that of
the variable after assignment. The actual store takes place before the
next sequence point, but may be before the value is used. This is
consistent with C usage.
if the result of an assignment is used as
an lvalue to store another value, then the new value will be stored in
the variable before the next sequence point. It is unspecified whether
the first assigned value is stored intermediately.
if the result
of an assignment is used as an lvalue to take an address, that address
is given (it doesn't change). The actual store of the new value takes
place before the next sequence point.
Jerry Schwarz:
recollection is different from Erwin's. I am confident that the
intention when we decided to make assignments lvalues was not to change
the semantics of evaluation of assignments. The semantics was supposed
to remain the same as C's.
Ervin seems to assume that because
assignments are lvalues, an assignment's value must be determined by a
read of the location. But that was definitely not our intention. As he
notes this has a significant impact on the semantics of assignment to a
volatile variable. If Erwin's interpretation were correct we would have
no way to write a volatile variable without also reading it.
Lawrence Crowl:
x=y=0, lvalue semantics implies an lvalue to rvalue conversion on the
result of y=0, which in turn implies a read. If y is volatile, lvalue
semantics implies both a read and a write on y.
The standard
apparently doesn't state whether there is a value dependence of the
lvalue result on the completion of the assignment. Such a statement in
the standard would solve the non-volatile C compatibility issue, and
would be consistent with a user-implemented operator=.
possible approach is to state that primitive assignment operators have
two results, an lvalue and a corresponding "after-store" rvalue. The
rvalue result would be used when an rvalue is required, while the
lvalue result would be used when an lvalue is required. However, this
semantics is unsupportable for user-defined assignment operators, or at
least inconsistent with all implementations that I know of. I would not
enjoy trying to write such two-faced semantics.
Erwin Unruh:
intent was for assignments to behave the same as in C. Unfortunately
the change of the result to lvalue did not keep that. An "lvalue of
type int" has no "int" value! So there is a difference between intent
and the standard's wording.
So we have one of several choices:
live with the incompatibility (and the problems it has for volatile variables)
the result of assignment an rvalue (only builtin-assignment, maybe only
for builtin types), which makes some presently valid programs invalid
introduce "two-face semantics" for builtin assignments, and clarify the sequence problematics
make a special rule for assignment to a volatile lvalue of builtin type
I think the last one has the least impact on existing programs, but it is an ugly solution.
Andrew Koenig:
Whatever we may have intended, I do not think that there is any clean way of making
volatile int v;
int i;
i = v = 42;
the same semantics in C++ as it does in C. Like it or not, the
subexpression v = 42 has the type ``reference to volatile int,'' so if
this statement has any meaning at all, the meaning must be to store 42
in v and then fetch the value of v to assign it to i.
Indeed, if
v is volatile, I cannot imagine a conscientious programmer writing a
statement such as this one. Instead, I would expect to see
v = 42;
i = v;
if the intent is to store 42 in v and then fetch the (possibly changed) value of v, or
v = 42;
i = 42;
if the intent is to store 42 in both v and i.
I do want is to ensure that expressions such as ``i = v = 42'' have
well-defined semantics, as well as expressions such as (i = v) = 42 or,
more realistically, (i += v) += 42 .
I wonder if the following resolution is sufficient:
Append to 5.17 expr.ass paragraph 1:
is a sequence point between assigning the new value to the left operand
and yielding the result of the assignment expression.
I believe
that this proposal achieves my desired effect of not constraining when
j is incremented in x[j++] = y, because I don't think there is a
constraint on the relative order of incrementing j and executing the
assignment. However, I do think it allows expressions such as (i += v)
+= 42, although with different semantics from C if v is volatile.
Notes on 10/01 meeting:
There was agreement that adding a sequence point is probably the right solution.
Notes from the 4/02 meeting:
working group reaffirmed the sequence-point solution, but we will look
for any counter-examples where efficiency would be harmed.
drafting, we note that ++x is defined in 5.3.2 expr.pre.incr as
equivalent to x+=1 and is therefore affected by this change. x++ is not
affected. Also, we should update any list of all sequence points.
Notes from October 2004 meeting:
centered around whether a sequence point “between assigning the new
value to the left operand and yielding the result of the expression”
would require completion of all side effects of the operand expressions
before the value of the assignment expression was used in another
expression. The consensus opinion was that it would, that this is the
definition of a sequence point. Jason Merrill pointed out that adding a
sequence point after the assignment is essentially the same as rewriting
b += a
b += a, b
Nelson expressed a desire for something like a “weak” sequence point
that would force the assignment to occur but that would leave the side
effects of the operands unconstrained. In support of this position, he
cited the following expression:
j = (i = j++)
the proposed addition of a full sequence point after the assignment to
i, the net effect is no change to j. However, both g++ and MSVC++
behave differently: if the previous value of j is 5, the value of the
expression is 5 but j gets the value 6.
Clark Nelson will investigate alternative approaches and report back to the working group.