char *s = "hello"和char s1[]="world"的区别

水木上的帖子

s 是全局数据区，
s1 是函数的栈空间区域，函数执行完成，这个空间就没了
------------------------------------------------------------------------------------------
7.2 可是我听说 char a[ ] 和 char *a 是一样的。
并非如此。(你所听说的应该跟函数的形式参数有关；参见问题 6.4) 数组不是指针。数组定义 char a[6] 请求预留 6 个字符的位置, 并用名称 ``a" 表示。也就是说, 有一个称为 ``a" 的位置, 可以放入 6 个字符。而指针申明 char *p, 请求一个位置放置一个指针, 用名称 ``p" 表示。这个指针几乎可以指向任何位置: 任何字符和任何连续的字符, 或者哪里也不指(参见问题 5.1 和 1.10)。

一个图形胜过千言万语。声明

   char a[] = "hello";
   char *p = "world";

将会初始化下图所示的数据结果:
       +---+---+---+---+---+---+
   a: | h | e | l | l | o |\0 |
   +---+---+---+---+---+---+
   +-----+ +---+---+---+---+---+---+
   p: | *======> | w | o | r | l | d |\0 |
   +-----+ +---+---+---+---+---+---+

根据 x 是数组还是指针, 类似 x[3] 这样的引用会生成不同的代码。认识到这一点大有裨益。以上面的声明为例, 当编译器看到表达式 a[3] 的时候, 它生成代码从 a 的位置开始跳过 3 个, 然后取出那个字符. 如果它看到 p[3], 它生成代码找到 ``p" 的位置, 取出其中的指针值, 在指针上加 3 然后取出指向的字符。换言之, a[3] 是名为 a 的对象 (的起始位置) 之后 3 个位置的值, 而 p[3] 是 p 指向的对象的 3 个位置之后的值. 在上例中, a[3] 和 p[3] 碰巧都是 'l' , 但是编译器到达那里的途径不尽相同。本质的区别在于类似 a 的数组和类似 p 的指针一旦在表达式中出现就会按照不同的方法计算, 不论它们是否有下标。下一问题继续深入解释。参见问题 1.13。

参考资料： [K&R2, Sec. 5.5 p. 104]; [CT&P, Sec. 4.5 pp. 64-5]。

7.3 那么, 在 C 语言中 ``指针和数组等价" 到底是什么意思？
在 C 语言中对数组和指针的困惑多数都来自这句话。说数组和指针 ``等价" 不表示它们相同, 甚至也不能互换。它的意思是说数组和指针的算法定义可以用指针方便的访问数组或者模拟数组。
特别地, 等价的基础来自这个关键定义:

一个 T 的数组类型的左值如果出现在表达式中会蜕变为一个指向数组第一个成员的指针(除了三种例外情况); 结果指针的类型是 T 的指针。
这就是说, 一旦数组出现在表达式中, 编译器会隐式地生成一个指向数组第一个成员地指针, 就像程序员写出了 &a[0] 一样。例外的情况是, 数组为 sizeof 或 & 操作符的操作数, 或者为字符数组的字符串初始值。

作为这个这个定义的后果, 编译器并那么不严格区分数组下标操作符和指针。在形如 a[i] 的表达式中, 根据上边的规则, 数组蜕化为指针然后按照指针变量的方式如 p[i] 那样寻址, 如问题 6.2 所述, 尽管最终的内存访问并不一样。如果你把数组地址赋给指针:

   p = a;

那么 p[3] 和 a[3] 将会访问同样的成员。
参见问题 6.6 和 6.11。
------------------------------------------------------------------------------------------
char *s1 = "hello";
char s2[6] = "hello";

类型指针与类型数组名在很多场合中可等价使用。容易给人造成的印象是两者是等价。
这话不尽然。首先我们要明白这是两个不同的东西。

s1的类型char *，而s2的类型是array of char。
s1初始化为一个指针值，指向一个内存区域，该处有6个字符的数据，
即'h', 'e', 'l', 'l', 'o', '\0'。在运行过程中，s1的值可改变，指向其他任何允许的地址。
但上面的数据("hello")不会在程序退出之前销毁[注：这是另外一个比较迷惑人的细节]，
即使s1变量生命周期结束。

s2初始化为6个字符的数组，也是'h', 'e', 'l', 'l', 'o', '\0'。在运行过程中，s2的内容可改变，
也就是存储在s2中的hello也就"消失"了。

但为什么容易给人造成类型指针与类型数组名可等价的疑惑呢？虽然类型不同，但C规定（为了
追求简洁与灵活性，C假设使用者知道自己代码会有什么结果。）在很多场合下，认为数组名
与类型指针类型兼容。记忆中只有2中情况下，数组名不可等同视为数组指针，&与sizeof操作符。

   void foo1(const char *str) {...};
   void foo2(int size) {return size};
   ...
   char *s1 = "hello";
   char s2[6] = "hello";

   foo1(s1); // ok
   foo1(s2); // ok
   foo1(&s2); // incompatible
   foo2(&s2[0]); // ok

   s1[0] = 0; // error
   s2[0] = 0; // ok

   s1 = s2; // ok
   s2 = s1; // error

   // 下面假设在ia32平台上运行
   foo2(sizeof(s1)); // return 4, pointer size
   foo2(sizeof(s2)); // return 6, array size

只记得上面的这些内容，不知道对错，与大家共同提高。

posted @ 2007-10-01 01:40 ZelluX 阅读(2236) | 评论 (0) | 编辑收藏

C++ 学习笔记 (6)

C++ 学习笔记(6)
1. 类声明中定义的函数都被自动处理为inline函数。

2. Triangular t5(); 这句话似乎声明了一个类的实例，但事实上，C++为了保持与C语法的一致，该语句会被解释成一个返回Triangular对象的函数；正确的声明应该去掉()。

3. 考虑下面一段代码
class Matrix {
public:
    Matrix( int row, int col ) // ...
    ~Matrix() { delete [] _pmat; }
private:
    int _row, _col;
    double *_pmat;
};

{
    Matrix mat(4, 4);
    {
        Matrix mat2 = mat;
    }
}
    把mat复制给mat2后，mat2中的_pmat与mat的_pmat指向同一个数组，在mat2所在的域结束后，mat2被释放，同时删除了_pmat指针指向的内容。错误发生。
    解决办法是在Matrix::Matrix( const Matrix &rhs )中详细指出深层拷贝的方法，同时实现赋值操作符的重载。

4. mutable 和 const
const方法无法修改类的成员，mutable成员除外。

posted @ 2007-10-01 01:08 ZelluX 阅读(396) | 评论 (0) | 编辑收藏

C++ 入门笔记 (5)

1. 查看某个类型的最大最小值：
#include <limits>
int max_int = numeric_limits<int>::max();
double min_dbl = numeric_limits<double>::max();

2. A struct is simply a class whose members are public by default.

3. 类的似有成员的保护依赖于对成员名使用的限制，也就是说，通过地址操作、强制类型转换等技术可以被访问，C++可以防止意外的发生，但不能阻止这种故意的欺骗。只有硬件层面才有可能防止这种恶意访问，但通常在现实中也很难做到。

4. 使用类的初始化列表代替代码体中的赋值语句可以节省很多时间。而且，如果一个数据成员是const的，那么它只能在初始化列表中进行初始化；如果一个数据成员是不具有零参数的构造函数的类类型，那么它也必须在初始化列表中进行初始化。

posted @ 2007-09-24 20:45 ZelluX 阅读(370) | 评论 (0) | 编辑收藏

LL(1) 语法解析

还是做一点笔记，记得牢一些

有了follow和first集合后，就可以构造一张预测解析表(predictive parsing table)了。
具体方法是：
对于任一产生式X -> ƒ，找到first(ƒ)中的每一个元素T，把X -> ƒ填充到X行T列中去；
如果ƒ nullable，还要把X -> ƒ填充到X行follow(ƒ)列中去

预测解析表构造完成后，如果某格中不止一个产生式，则说明该语法不适用于预测解析表；
如果每格至多一个产生式，则该语法被称为LL(1) Left-to-right parse, Leftmost-derivation, 1-symbol lookahead

posted @ 2007-09-24 19:34 ZelluX 阅读(619) | 评论 (0) | 编辑收藏

zz 整理的内核学习计划

开始看lkd，不过最新可以抽出来的时间不多，下星期Java课还有project，还要写一份robocode的项目文档，然后编译课也得跟上，英语高口课还得准备资料。。。先转一篇学习计划过来

发信人: ccedcced (turing), 信区: KernelTech
标题: 整理的内核学习计划
发信站: 水木社区 (Fri Jan 6 18:15:35 2006), 转信

内核学习现在开始
方法就是：
读书（lkd）+源代码阅读-->项目

先从基础做起！
希望大家都能积极地参与讨论，写读书笔记和源码阅读笔记！
------------------------------
内核学习第一周计划--Are you ready?

时间:11月2号--11月5日
阅读内容：前言和第一章
导读：这是比较容易阅读的开头部分，内容比较少而且常识性的内容比较多。请在阅读的过程中考虑如下的问题：

1.linux和unix相比有哪些特点？
2.内核编程和用户空间编程相比有哪些不同之处？
3.自己编译一下内核，你编译成功了么？如果不成功，有什么问题？使用你新编译的内核，
  能顺利启动么？有什么问题？
4.linux内核源代码树中你能找到sg设备驱动是在那个文件中实现的么？sg是什么含意?
  清楚地了解一下内核中源代码树的结构。
Are you ready?
------------------------------
内核学习第二周计划
时间:11.7-11.13
内容:
主要是lkd中文版第一版第二章(英文第二版版第三章)的内容，比较重要。

1.和进程管理相关的内核文件都有哪些?找出来大致浏览一下.
2.什么是进程和线程?在Linux中有什么独特的地方？
3.什么是进程描述符？怎样得到当前进程的进程描述符？进程的内核栈有多大？
4.进程的状态都有哪些？在什么情况下发生转化？
5.Linux中所有进程之间的关系是怎么样的？
6.用户线程和内核线程的区别和联系？
7.Linux是怎样创建进程和线程的？
8.Linux怎样终结进程？
9.对照相应的内核源代码文件，分析一下问题3、5、6、7。
------------------------------
内核学习第三、四周计划
内核学习第三、四周计划：

时间：11.14-11.27

调度是操作系统中非常重要的部分，也是最核心的东西。通过学习这一章，希望大家都能大致掌握Linux的调度策略，以及为什么要这么做，Linux相比其它的系统有什么优点。重点多分析源代码中的算法实现！！我希望我能有时间和大家一起分析代码，也希望大家积极主动地分析一些代码片断。

学习内容：
1.进程调度最基本的原理是什么？
2.列举出几个I/O消耗性和处理器消耗型的进程。
3.Linux都采用了哪些调度的算法？详细解释一下这些算法。
4.进程什么时间进入运行态？什么时间进入休眠（阻塞）状态？
5.了解进程抢占的算法；
6.讨论一下Linux进程调度的实时性怎么样，还有哪些需要提高的地方？
7.自己查找进程调度的相关文件，分析为题3-6。
------------------------------
内核学习第五周计划
时间：11.28--12.5

1、什么是系统调用？

2、为什么需要系统调用？

3、实现系统调用相关的代码有哪些，找出来浏览一下

4、详细阅读getuid()这一下系统调用的实现代码

5、如何导出sys_call_table，有几种方法，注意不同内核版本的区别

6、尝试自己给kernel添加一个简单的系统调用
   功能要求：调用这个系统调用，使用户的uid等于0。（这个题目取自《边干边学》）
7、采用添加系统调用的方式实现一个新功能的利弊有哪些，替代方法是什么？
------------------------------
内核学习第六周计划

这一周就总结了一下中断和下半部相关的知识点，供大家参考一下！同时附件里有我以前学习时候的笔记，写的不好请多多指教。
内核学习第六周计划:
1、如何理解中断、中断上下文和进程上下文的区别、为何中断不能睡眠
2、关于x86中选择子、描述符和各种门的理解
3、查阅相关资料和内核源码理解：
中断是如何发生以及硬件和内核是如何相应的，如何返回的
x86上中断发生时上下文(寄存器)如何保存以及中断返回时上下文如何恢复的，系统的第一个任务是如何启动的

4、内核中安排下半部的理由
5、软中断及其他的下半部策略适用于什么样的任务和场合？
6、下半部可以睡眠么？为什么？
7、2.4和2.6内核中下半部包括哪些部分，为何2.6内核相比2.4内核会做这样的改进
8、阅读内核中关于软中断、tasklet以及工作队列的代码、相关书籍和资料，总结如下两个问题:
软中断、tasklet以及工作队列是如何初始化，注册以及触发的，使用了哪些关键的数据结构及内核变量？
软中断、tasklet以及工作队列都在什么场合下使用？
------------------------------
内核学习第七周计划
内核学习第七周计划：
时间：12月13日--12月19日
内容：内核同步的理论知识。

问题：
1.为什么要进行内核的同步？
2.内核中怎么定义原子操作？
3.竞争产生的条件与加锁的顺序？
4.要保护的对象？
5.死锁产生的条件与解决办法？
6.你有什么比较好的方法来调试多线程的程序？
7.据一个内核中产生竞争的例子。
------------------------------
内核学习第八周计划
第八周：
内容:
上一周我们分析了内核中的同步，这一次我们要接触的是内核中怎么进行同步和互斥。

问题：
1.原子操作的粒度问题；
2.自旋锁的设计及其应用场合,分析自旋锁；
3.信号量及其应用，信号量怎么使用？
4.锁的粒度以及其分类；
5.内核可抢占性的实现及其与锁的关系；
6.smp中需要考虑哪些同步与互斥；
------------------------------
内核学习第九周计划
前面的话：
   内核学习已经进行了两个月的时间，LKD这本书我们基本上已经进行了一半。希望大家在下面多用功。
我们在这个计划里面列出来的只是一些比较基本的问题，而且我们没有给出问题的答案，但是只要你在下面
用功了，我想答案就在这本书里和源代码里面。

本周内容：TIMERS AND TIME MANAGEMENT

1.HZ和jiffies值的定义?
2.内核中怎样解决jiffies的回绕？为什么这样可以解决jiffies回绕？
3.时钟中断处理程序有哪些值得注意的地方？
4.xtime_lock锁和seqlock锁?
5.定时器的实现、使用和竞争条件？
6.udelay()&mdelay()?
------------------------------
内核学习第十周计划：
内存管理
   学习内容
   内存管理是比较庞大的一个部分，在lkd这本书中用了很少的篇幅，从这里面我们基本能看清楚内存管理的概貌。《情景分析》一书关于内存管理的部分讲得比较多，代码分析比较透彻也比较深入。但是相对的难度也比较大，建议先看看lkd这本书，然后再看《情景分析》一书的内存管理。

   问题：
   1.内核中内存的分页、分区；
   2.内核中有哪些函数来获得内存？内核中分配内存要注意什么？
   3.为什么使用slab?slab对象的详细分析。
   4.内核栈上内存的静态分配问题；

posted @ 2007-09-24 11:22 ZelluX 阅读(664) | 评论 (0) | 编辑收藏

总算弄明白生成first和follow集合的算法了

太笨了，看了好久才明白。。。
first集合没有问题

follow集合：
以A -> aBb为例，如果b nullable，则follow(B)包括follow(A)，原因很简单，把A看成一个整体，当作为production的右式时它后面直接跟的元素自然也可能是B后面直接跟的元素，因为b可能为空。

理解follow集合的定义后，虎书上给出的算法
if Yi+1 ... Yk are all nullable
then FOLLOW(Yi) = FOLLOW(Yi) U FOLLOW(X)
if Yi+1 ... Yj-1 are all nullable
then FOLLOW(Yi) = FOLLOW(Yi) U FIRST(Yj)
就不难理解了

posted @ 2007-09-23 22:22 ZelluX 阅读(2453) | 评论 (2) | 编辑收藏

A Note on TMin

关于表示32位整型中最小的数-2147483648的一篇文章,from CSAPP http://csapp.cs.cmu.edu/public/tmin.html

Summary

Due to one of the rules for processing integer constants in ANSI C, the numeric constant -2147483648 is handled in a peculiar way on a 32-bit, two's complement machine.

The problem can be corrected by writing -2147483647-1, rather than -2147483648 in any C code.

Description of Problem

The ANSI C standard requires that an integer constant too large to be represented as a signed integer be ``promoted'' to an unsigned value. When GCC encounters the value 2147483648, it gives a warning message: ``warning: decimal constant is so large that it is unsigned.'' The result is the same as if the value had been written 2147483648U.
ANSI C标准中要求太大的整型常量都表示为unsigned

The compiler processes an expression of the form -X by first reading the expression X and then negating it. Thus, when the C compiler encounters the constant -2147483648, it first processes 2147483648, yielding 2147483648U, and then negates it. The unsigned negation of this value is also 2147483648U. The bit pattern is correct, but the type is wrong!

书上的一个小错误

Writing TMin in Code

The ANSI C standard states that the maximum and minimum integers should be declared as constants INT_MAX and INT_MIN in the file limits.h. Looking at this file on an IA32 Linux machine (in the directory /usr/include), we find the following declarations:

/* Minimum and maximum values a `signed int' can hold.  */

#define INT_MAX	2147483647

#define INT_MIN	(-INT_MAX - 1)

This method of declaring INT_MIN avoids accidental promotion and also avoids any warning messages by the C compiler about integer overflow.

The following are ways to write TMin_32 for a 32-bit machine that give the correct value and type, and don't cause any error messages:

-2147483647-1
(int) 2147483648U
1<<31

The first method is preferred, since it indicates that the result will be a negative number.
表示-2147483648的几种方法

posted @ 2007-09-19 21:26 ZelluX 阅读(559) | 评论 (1) | 编辑收藏

系统恢复记

原来只给机房电脑的Arch配了20G的空间，现在想装个virtualbox，空间自然不够了，于是到windows下删了一个分区用于挂载/home的内容。
然后问题就出来了，由于少了一个分区，原来arch所在的分区(sda8)向前移了一位(sda7)，结果grub启动出错，无法进入系统。
从机房管理员那借来Redhat的安装盘，进入rescue模式，挂在好/dev/sda7后，修改boot/grub/menu.lst中的盘符，重启，问题依旧。
只好先恢复winxp再说，用winxp工具盘启动后，fdisk /mbr，重启，安装grub for dos，再重启，进入grub
grub> root (hd0,6)
grub> setup (hd0)
报错，估计是dos下的问题，只好手动进入arch
grub> root (hd0,6)
grub> kernel /boot/vmlinuz26 root=/dev/sda7 ro
grub> initrd /boot/kernel26.img
grub> boot
进入后仍然有错误，说是sda8无法访问，这时才想起fstab还没改过，但是当前的挂载的sda7还是只读的，只好再用redhat的启动盘启动，挂载sda7后修改fstab中的信息。

问题解决。

posted @ 2007-09-19 16:07 ZelluX 阅读(322) | 评论 (0) | 编辑收藏

[zz] proftpd 学习笔记

摘要: 阅读全文

posted @ 2007-09-18 23:16 ZelluX 阅读(330) | 评论 (0) | 编辑收藏

Catalan 数

n个不同的物体按固定次序入栈，随时可以出栈，求最后可能的出栈序列的总数。
只想到这个问题等价于把n个push和n个pop操作排列，要求任意前几个操作中push数都不能少于pop数，至于这个排列数怎么求就不知道了。请教了peter大牛后，原来这就是一个Catalan数的应用。

Wikipedia上的Catalan numbers:

In combinatorial mathematics, the Catalan numbers form a sequence of natural numbers that occur in various counting problems, often involving recursively defined objects. They are named for the Belgian mathematician Eugène Charles Catalan (1814–1894).

The n^th Catalan number is given directly in terms of binomial coefficients by

$C_n = \frac{1}{n+1}{2n\choose n} = \frac{(2n)!}{(n+1)!\,n!} \qquad\mbox{ for }n\ge 0.$

The first Catalan numbers (sequence A000108 in OEIS) for n = 0, 1, 2, 3, … are

1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, 742900, 2674440, 9694845, 35357670, 129644790, 477638700, 1767263190, 6564120420, 24466267020, 91482563640, 343059613650, 1289904147324, 4861946401452, …

Properties

An alternative expression for C_n is

$C_n = {2n\choose n} - {2n\choose n-1} \quad\mbox{ for }n\ge 1.$

This shows that C_n is a natural number, which is not a priori obvious from the first formula given. This expression forms the basis for André's proof of the correctness of the formula (see below under second proof).

The Catalan numbers satisfy the recurrence relation

$C_0 = 1 \quad \mbox{and} \quad C_{n+1}=\sum_{i=0}^{n}C_i\,C_{n-i}\quad\mbox{for }n\ge 0.$

They also satisfy:

$C_0 = 1 \quad \mbox{and} \quad C_{n+1}=\frac{2(2n+1)}{n+2}C_n,$

which can be a more efficient way to calculate them.

Asymptotically, the Catalan numbers grow as

$C_n \sim \frac{4^n}{n^{3/2}\sqrt{\pi}}$

in the sense that the quotient of the n^th Catalan number and the expression on the right tends towards 1 for n → ∞. (This can be proved by using Stirling's approximation for n!.)

[edit]

Applications in combinatorics

There are many counting problems in combinatorics whose solution is given by the Catalan numbers. The book Enumerative Combinatorics: Volume 2 by combinatorialist Richard P. Stanley contains a set of exercises which describe 66 different interpretations of the Catalan numbers. Following are some examples, with illustrations of the case C₃ = 5.

C_n is the number of Dyck words of length 2n. A Dyck word is a string consisting of n X's and n Y's such that no initial segment of the string has more Y's than X's (see also Dyck language). For example, the following are the Dyck words of length 6:

XXXYYY XYXXYY XYXYXY XXYYXY XXYXYY.

Re-interpreting the symbol X as an open parenthesis and Y as a close parenthesis, C_n counts the number of expressions containing n pairs of parentheses which are correctly matched:

((())) ()(()) ()()() (())() (()())

C_n is the number of different ways n + 1 factors can be completely parenthesized (or the number of ways of associating n applications of a binary operator). For n = 3 for example, we have the following five different parenthesizations of four factors:

$((ab)c)d \quad (a(bc))d \quad(ab)(cd) \quad a((bc)d) \quad a(b(cd))$

Successive applications of a binary operator can be represented in terms of a binary tree. It follows that C_n is the number of rooted ordered binary trees with n + 1 leaves:

If the leaves are labelled, we have the quadruple factorial numbers.

C_n is the number of non-isomorphic full binary trees with n vertices that have children, usually called internal vertices or branches. (A rooted binary tree is full if every vertex has either two children or no children.)

C_n is the number of monotonic paths along the edges of a grid with n × n square cells, which do not cross the diagonal. A monotonic path is one which starts in the lower left corner, finishes in the upper right corner, and consists entirely of edges pointing rightwards or upwards. Counting such paths is equivalent to counting Dyck words: X stands for "move right" and Y stands for "move up". The following diagrams show the case n = 4:

Image:Catalan number 4x4 grid example.svg

C_n is the number of different ways a convex polygon with n + 2 sides can be cut into triangles by connecting vertices with straight lines. The following hexagons illustrate the case n = 4:

Error creating thumbnail:

C_n is the number of stack-sortable permutations of {1, ..., n}. A permutation w is called stack-sortable if S(w) = (1, ..., n), where S(w) is defined recursively as follows: write w = unv where n is the largest element in w and u and v are shorter sequences, and set S(w) = S(u)S(v)n, with S being the identity for one-element sequences.

C_n is the number of noncrossing partitions of the set { 1, ..., n }. A fortiori, C_n never exceeds the nth Bell number. C_n is also the number of noncrossing partitions of the set { 1, ..., 2n } in which every block is of size 2. The conjunction of these two facts may be used in a proof by mathematical induction that all of the free cumulants of degree more than 2 of the Wigner semicircle law are zero. This law is important in free probability theory and the theory of random matrices.

[edit]

Proof of the formula

There are several ways of explaining why the formula

$C_n = \frac{1}{n+1}{2n\choose n}$

solves the combinatorial problems listed above. The first proof below uses a generating function. The second and third proofs are examples of bijective proofs; they involve literally counting a collection of some kind of object to arrive at the correct formula.

[edit]

First proof

We start with the observation that several of the combinatorial problems listed above can easily be seen to satisfy the recurrence relation

$C_0 = 1 \quad \mbox{and} \quad C_{n+1}=\sum_{i=0}^{n}C_i\,C_{n-i}\quad\mbox{for }n\ge 0.$

For example, every Dyck word w of length ≥ 2 can be written in a unique way in the form

w = Xw₁Yw₂

with (possibly empty) Dyck words w₁ and w₂.

The generating function for the Catalan numbers is defined by

$c(x)=\sum_{n=0}^\infty C_n x^n.$

As explained in the article titled Cauchy product, the sum on the right side of the above recurrence relation is the coefficient of xⁿ in the product

$\left(\sum_{i=0}^\infty C_i x^i\right)^2.$

Therefore

$\left(\sum_{i=0}^\infty C_i x^i\right)^2 = \sum_{n=0}^\infty C_{n+1} x^n.$

Multiplying both sides by x, we get

$x\left(\sum_{i=0}^\infty C_i x^i\right)^2 = \sum_{n=0}^\infty C_{n+1} x^{n+1} = \sum_{n=1}^\infty C_n x^n = -1 + \sum_{n=0}^\infty C_n x^n.$

So we have

$c(x)=1+xc(x)^2,\,$

and hence

$c(x) = \frac{1-\sqrt{1-4x}}{2x}.$

The square root term can be expanded as a power series using the identity

$\sqrt{1+y} = 1 - 2\sum_{n=1}^\infty {2n-2 \choose n-1} \left(\frac{-1}{4}\right)^n \frac{y^n}{n} ,$

which can be proved, for example, by the binomial theorem, (or else directly by considering repeated derivatives of $\sqrt{1+y}$ ) together with judicious juggling of factorials. Substituting this into the above expression for c(x) produces, after further manipulation,

$c(x) = \sum_{n=0}^\infty {2n \choose n} \frac{x^n}{n+1.}$

Equating coefficients yields the desired formula for C_n.

[edit]

Second proof

This proof depends on a trick due to D. André, which is now more generally known as the reflection principle (not to be confused with the Schwarz reflection theorem in complex analysis). It is most easily expressed in terms of the "monotonic paths which do not cross the diagonal" problem (see above).

Figure 1. The green portion of the path is being flipped.

Suppose we are given a monotonic path in an n × n grid that does cross the diagonal. Find the first edge in the path that lies above the diagonal, and flip the portion of the path occurring after that edge, along a line parallel to the diagonal. (In terms of Dyck words, we are starting with a sequence of n X's and n Y's which is not a Dyck word, and exchanging all X's with Y's after the first Y that violates the Dyck condition.) The resulting path is a monotonic path in an (n − 1) × (n + 1) grid. Figure 1 illustrates this procedure; the green portion of the path is the portion being flipped.

Since every monotonic path in the (n − 1) × (n + 1) grid must cross the diagonal at some point, every such path can be obtained in this fashion in precisely one way. The number of these paths is equal to

${2n\choose n-1}$ .

Therefore, to calculate the number of monotonic n × n paths which do not cross the diagonal, we need to subtract this from the total number of monotonic n × n paths, so we finally obtain

${2n\choose n}-{2n\choose n-1}$

which is the nth Catalan number C_n.

[edit]

Third proof

The following bijective proof, while being more involved than the previous one, provides a more natural explanation for the term n + 1 appearing in the denominator of the formula for C_n.

Figure 2. A path with exceedance 5.

Suppose we are given a monotonic path, which may happen to cross the diagonal. The exceedance of the path is defined to be the number of pairs of edges which lie above the diagonal. For example, in Figure 2, the edges lying above the diagonal are marked in red, so the exceedance of the path is 5.

Now, if we are given a monotonic path whose exceedance is not zero, then we may apply the following algorithm to construct a new path whose exceedance is one less than the one we started with.

Starting from the bottom left, follow the path until it first travels above the diagonal.
Continue to follow the path until it touches the diagonal again. Denote by X the first such edge that is reached.
Swap the portion of the path occurring before X with the portion occurring after X.

The following example should make this clearer. In Figure 3, the black circle indicates the point where the path first crosses the diagonal. The black edge is X, and we swap the red portion with the green portion to make a new path, shown in the second diagram.

Figure 3. The green and red portions are being exchanged.

Notice that the exceedance has dropped from three to two. In fact, the algorithm will cause the exceedance to decrease by one, for any path that we feed it.

Figure 4. All monotonic paths in a 3×3 grid, illustrating the exceedance-decreasing algorithm.

It is also not difficult to see that this process is reversible: given any path P whose exceedance is less than n, there is exactly one path which yields P when the algorithm is applied to it.

This implies that the number of paths of exceedance n is equal to the number of paths of exceedance n − 1, which is equal to the number of paths of exceedance n − 2, and so on, down to zero. In other words, we have split up the set of all monotonic paths into n + 1 equally sized classes, corresponding to the possible exceedances between 0 and n. Since there are

${2n\choose n}$

monotonic paths, we obtain the desired formula

$C_n = \frac{1}{n+1}{2n\choose n}.$

Figure 4 illustrates the situation for n = 3. Each of the 20 possible monotonic paths appears somewhere in the table. The first column shows all paths of exceedance three, which lie entirely above the diagonal. The columns to the right show the result of successive applications of the algorithm, with the exceedance decreasing one unit at a time. Since there are five rows, C₃ = 5.

[edit]

Hankel matrix

The n×n Hankel matrix whose (i, j) entry is the Catalan number C_i+j has determinant 1, regardless of the value of n. For example, for n = 4 we have

$\det\begin{bmatrix}1 & 1 & 2 & 5 \\ 1 & 2 & 5 & 14 \\ 2 & 5 & 14 & 42 \\ 5 & 14 & 42 & 132\end{bmatrix} = 1$

Note that if the entries are ``shifted", namely the Catalan numbers C_i+j+1, the determinant is still 1, regardless of the size of n. For example, for n = 4 we have

$\det\begin{bmatrix}1 & 2 & 5 & 14 \\ 2 & 5 & 14 & 42 \\ 5 & 14 & 42 & 132 \\ 14 & 42 & 132 & 429 \end{bmatrix} = 1$ .

The Catalan numbers is the unique sequence with this property.

[edit]

Quadruple factorial

The quadruple factorial is given by $\frac{2n!}{n!}$ , or $\left(n+1\right)! C_n$ . This is the solution to labelled variants of the above combinatorics problems. It is entirely distinct from the multifactorials.

[edit]

History

The Catalan sequence was first described in the 18th century by Leonhard Euler, who was interested in the number of different ways of dividing a polygon into triangles. The sequence is named after Eugène Charles Catalan, who discovered the connection to parenthesized expressions. The counting trick for Dyck words was found by D. André in 1887.

posted @ 2007-09-17 19:55 ZelluX 阅读(1503) | 评论 (2) | 编辑收藏

仅列出标题

Snowdream

char *s = "hello"和char s1[]="world"的区别

C++ 学习笔记 (6)

C++ 入门笔记 (5)

LL(1) 语法解析

zz 整理的内核学习计划

总算弄明白生成first和follow集合的算法了

A Note on TMin

Summary

Description of Problem

Writing TMin in Code

系统恢复记

[zz] proftpd 学习笔记

Catalan 数

Properties

Applications in combinatorics

Proof of the formula

First proof

Second proof

Third proof

Hankel matrix

Quadruple factorial

History

日历

常用链接

留言簿(21)

随笔分类(390)

随笔档案(389)

文章档案(7)

相册

15ers

友情链接

收藏夹

搜索

积分与排名

最新随笔

最新评论

阅读排行榜

评论排行榜