The kernel are trying to use as little resource as possible. Here is an example, Originally, in kernerl 2.4, the size of Kernel Stack is 8K. Now, in kernel 2.6, it could be 4K, if you enable it in compilaiton time.
Why will kernel spend effort to support such a feature when most of PC have more than 1 Gigabyte memonry. I think it has something to do with the C10K probleum; C10K means Concurrent 10 Thousand Processes(Threads). considering a system with more thant 10 thousand processes, such as a WEB server, the save of 4K in every kernel stack will become 4K * 10 K = 40 M tatal save of memory, which is a big deal!
How is it possible to achieve that? originally the kernel mode stack is also used in Exception and Interrupt handling, but Exception and Interrupt handling is not specific to any process. so in 2.6, Interrupt and Exception will have their own Stack for each CPU. Kernel stack is only used by process in the kernel mode. so the acutal kernel stack did not become small.
2.4 8K Stack shared between process kernel mode, Exception, Interrupt.
v.s
2.6 4K Stack specific for process kernel mode Stack
4K Stack specific for Exception Stack
4K Stack specific for Interrupt Stack
Besides this, in 8K stack of 2.4, task_struct is at the bottom of stack, which may cost about 1K, in 4K stack of 2.6, only thread_info is at the bottom of stack, the task_struct is put into a per-CPU data structre, thread_info is only about 50 bytes.