码迷,mamicode.com
首页 > 系统相关 > 详细

synchronization in Linux kernel

时间:2016-04-26 21:00:46      阅读:223      评论:0      收藏:0      [点我收藏+]

标签:

kernel  preemption:

the main  characteristic  of  a  preemptive kernel  is  that  a  process  running  in  the  kernel  mode  can be replaced by  another  process  while  in  the  middle  of  a  kernel  function .

The main motivation for making a kernel preemptive is to reduce the dispatch latencyof the User Mode processes, that is, the delay between the time they become runnable and the time they actually begin running.

the  preempt may  be  happened in  the  interruption  or  exception ;when  Returning from Interrupts and Exceptions  , kernel preemption is disabled when the preempt_count field in the thread_info descriptor referenced by the current_thread_info() macro is greater than zero.
 it is greater than zero when any of the following cases occurs:

1. The kernel is executing an interrupt service routine.
2. The deferrable functions are disabled (always true when the kernel is executing asoftirq or tasklet).
3. The kernel preemption has been explicitly disabled by setting the preemption counter to a positive value.

the kernel can be preempted only when it is executing an exception handler (in particular a system call) and the kernel preemption has not been explicitly disabled. 

whne  process  "Returning from Interrupts and Exceptions” ,the local CPU must have local interrupts enabled, otherwise kernel preemption is not performed.

技术分享

技术分享

The preempt_enable() macro decreases the preemption counter, then checks whetherthe TIF_NEED_RESCHED flag is set 

In this case, a process switch request is pending, so the macro invokes the preempt_schedule() function,which essentially executes the following code:

/*
 * this is the entry point to schedule() from kernel preemption
 * off of irq context.
 * Note, that this is called and return with irqs disabled. This will
 * protect us against recursive calling from irq.
 */
asmlinkage void __sched preempt_schedule_irq(void)
{
	struct thread_info *ti = current_thread_info();

	/* Catch callers which need to be fixed */
	BUG_ON(ti->preempt_count || !irqs_disabled());

	do {
		add_preempt_count(PREEMPT_ACTIVE);
		local_irq_enable();
		__schedule();
		local_irq_disable();
		sub_preempt_count(PREEMPT_ACTIVE);

		/*
		 * Check again in case we missed a preemption opportunity
		 * between schedule and now.
		 */
		barrier();
	} while (need_resched());
}

技术分享

The function checks whether local interrupts are enabled and the preempt_count field of current is zero; if both conditions are true, it invokes schedule() to select another process to run. Therefore, kernel preemption may happen either when a kernel control path (usually, an interrupt handler) is terminated, or when an exception handler
reenables kernel preemption by means of preempt_enable().kernel preemption may also happen when deferrable functions are enabled.

Various types of synchronization techniques used by the kernel

技术分享

Per_CPU

a per_cpu is an array of data structure which the elements per each CPU  in  the  system .the  CPUs  have its own per_cpus elements ;they cannot access to the other per_cpus elements  belong  to other CPUs ;

per-CPU variables provide protection against concurrent accesses from several CPUs, they do not provide protection against accesses fromasynchronous functions (interrupt handlers and deferrable functions). In these cases, additional synchronization primitives are required.

As a general rule, a kernel control path should access a per-CPU variable with kernel preemption disabled.Just consider, for instance, what would happen if a kernel control path gets the address of its local copy of a per-CPU variable, and then it is preempted and moved to another CPU: the address still refers to the element of the previous CPU.

the initialize the per_cpu elements  at  start_kernel()function such as:

start_kernel()-->setup_per_cpu_areas();

分配好每个cpu的per-cpu变量副本所占用的物理空间的同时,也对__per_cpu_offset[NR_CPUS]数组进行了初始化用于以后找到指定CPU的这些per-cpu变量副本。

the macrors and functions of per_CPU variables

技术分享

访问静态声明和定义Per-CPU变量的API静态定义的per cpu变量不能象普通变量那样进行访问,需要使用特定的接口函数,具体如下:

get_cpu_var(var)
put_cpu_var(var)

/*
 * Must be an lvalue. Since @var must be a simple identifier,
 * we force a syntax error here if it isn't.
 */
#define get_cpu_var(var) (*({					preempt_disable();					&__get_cpu_var(var); }))

/*
 * The weird & is necessary because sparse considers (void)(var) to be
 * a direct dereference of percpu variable (var).
 */
#define put_cpu_var(var) do {					(void)&(var);						preempt_enable();				} while (0)


#define DEFINE_PER_CPU_SECTION(type, name, sec)					__PCPU_ATTRS(sec) PER_CPU_DEF_ATTRIBUTES				__typeof__(type) name

/*
 * Variant on the per-CPU variable declaration/definition theme used for
 * ordinary per-CPU variables.
 */
#define DECLARE_PER_CPU(type, name)						DECLARE_PER_CPU_SECTION(type, name, "")
在这里具体arch specific的percpu代码中(arch/arm/include/asm/percpu.h)可以定义PER_CPU_DEF_ATTRIBUTES,以便控制该per cpu变量的属性,当然,如果arch specific的percpu代码不定义,那么在general arch-independent的代码中(include/asm-generic/percpu.h)会定义为空。这里可以顺便提一下Per-CPU变量的软件层次:
(1)arch-independent interface。在include/linux/percpu.h文件中,定义了内核其他模块要使用per cpu机制使用的接口API以及相关数据结构的定义。内核其他模块需要使用per cpu变量接口的时候需要include该头文件
(2)arch-general interface。在include/asm-generic/percpu.h文件中。如果所有的arch相关的定义都是一样的,那么就把它抽取出来,放到asm-generic目录下。毫无疑问,这个文件定义的接口和数据结构是硬件相关的,只不过软件抽象各个arch-specific的内容,形成一个arch general layer。一般来说,我们不需要直接include该头文件,include/linux/percpu.h会include该头文件。
(3)arch-specific。这是和硬件相关的接口,在arch/arm/include/asm/percpu.h,定义了ARM平台中,具体和per cpu相关的接口代码。
我们回到正题,看看__PCPU_ATTRS的定义:
#define __PCPU_ATTRS(sec)                        \ 
    __percpu __attribute__((section(PER_CPU_BASE_SECTION sec)))    \ 
    PER_CPU_ATTRIBUTES
PER_CPU_BASE_SECTION 定义了基础的section name symbol,定义如下:
#ifndef PER_CPU_BASE_SECTION 
#ifdef CONFIG_SMP 
#define PER_CPU_BASE_SECTION ".data..percpu" 
#else 
#define PER_CPU_BASE_SECTION ".data" 
#endif 
#endif
虽然有各种各样的静态Per-CPU变量定义方法,但是都是类似的,只不过是放在不同的section中,属性不同而已,可以查看相关质料查看section的安排

Atomic Operations

in 80*86

an atomic operation works by locking the affected memory address in the CPU‘s cache. The CPU acquires the memory address exclusively in its cache and then does not permit any other CPU to acquire or share that address until the operation completes.

When the control unit detects the prefix by the  lock byte (0xf0), it “locks” the memory bus until the instruction is finished. 

Optimization and Memory Barriers

When using optimizing compilers, wu may never take for granted that instructions will be performed in the exact order in which they appear in the source code.

An optimization barrier primitiveensures that the assembly language instructions corresponding to C statements placed before the primitive are not mixed by the compiler with assembly language instructions corresponding to C statements placed after the primitive. In Linux the barrier() macro, which expands into asm volatile("":::"memory"), acts as an optimization barrier. 

A memory barrier primitive ensures that the operations placed before the primitive
are finished before starting the operations placed after the primitive.

保证memory barrier源语 前的指令比在源语后指令先执行;;;;

In the 80×86 processors, the following kinds of assembly language instructions are
said to be “serializing” because they act as memory barriers:

All instructions that operate on I/O ports
 All instructions prefixed by the lock byte (see the section “Atomic Operations”)
? All instructions that write into control registers, system registers, or debug registers (for instance, cli and sti, which change the status of the IF flag in the
eflags register)
? The lfence, sfence, and mfence assembly language instructions, which have been
introduced in the Pentium 4 microprocessor to efficiently implement read memory barriers, write memory barriers, and read-write memory barriers, respectively.
? A few special assembly language instructions; among them, the iret instruction
that terminates an interrupt or exception handler
上述指令和memory barrier 相同效果

Spin Locks

Spin_lockkernel中的实现对单核(UP),多核(SMP)有不同的处理方式。对单核来说,如果spin_lock不处于中断上下文,则spin_lock锁定的代码丢失CPU拥有权,只会在内核抢占的时候发生。所以,对于单核来说,只需要在spin_lock获得锁的时候禁止抢占,释放锁的时候开放抢占。对多核来说,存在两段代码同时在多核上执行的情况,这时候才需要一个真正的锁来宣告代码对资源的占有。

spin lock除了考虑SMP-safe以外,还要考虑两种伪并发情况,就是中断(interrupt)和抢占(preemption),就是要保证interrupt-safe和preempt-safe。
如果在中断处理程序中,因为要访问共享变量而使用spin lock,则要避免dead-lock出现。比如,CPU0上线程A获取了锁1,在获取和释放锁之间CPU0上发生软中断进入中断处理程序,中断处理程序也尝试去获取spin lock,但是由于同一CPU0上的lock holder线程A在中断处理程序退出之前无法被调度而释放锁,所以在CPU0上就出现dead-lock;但是如果软中断是发生在其他CPU比如CPU1上,则是没有问题的,因为发现在CPU1上的中断不会中断CPU0上lock holder线程A的执行。所以要保证interrupt-safe,就要在获取锁之前disable本地CPU中断。

The reasons you mustn‘t use these versions if you have interrupts that
113 play with the spinlock is that you can get deadlocks:
114
115 spin_lock(&lock);
116 ...
117 <- interrupt comes in:
118 spin_lock(&lock);
119
120 where an interrupt tries to lock an already locked variable. This is ok if
121 the other interrupt happens on another CPU, but it is _not_ ok if the
122 interrupt happens on the same CPU that already holds the lock, because the
123 lock will obviously never be released (because the interrupt is waiting
124 for the lock, and the lock-holder is interrupted by the interrupt and will
125 not continue until the interrupt has been processed). 
126
127 (This is also the reason why the irq-versions of the spinlocks only need
128 to disable the _local_ interrupts - it‘s ok to use spinlocks in interrupts
129 on other CPU‘s, because an interrupt on another CPU doesn‘t interrupt the
130 CPU that holds the lock, so the lock-holder can continue and eventually
131 releases the lock). 

In Linux, each spin lock is represented by aspinlock_tstructure consisting ofsome fields:

typedef struct spinlock {
	union {
		struct raw_spinlock rlock;

#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define LOCK_PADSIZE (offsetof(struct raw_spinlock, dep_map))
		struct {
			u8 __padding[LOCK_PADSIZE];
			struct lockdep_map dep_map;
		};
#endif
	};
} spinlock_t;

typedef struct raw_spinlock {
<span style="white-space:pre">	</span>arch_spinlock_t raw_lock;
#ifdef CONFIG_GENERIC_LOCKBREAK
<span style="white-space:pre">	</span>unsigned int break_lock;
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
<span style="white-space:pre">	</span>unsigned int magic, owner_cpu;
<span style="white-space:pre">	</span>void *owner;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
<span style="white-space:pre">	</span>struct lockdep_map dep_map;
#endif
} raw_spinlock_t;
typedef struct arch_spinlock {
<span style="white-space:pre">	</span>unsigned int slock;
} arch_spinlock_t;
there are some macros about the spinlock;

技术分享

#define spin_lock_init(_lock)				do {								spinlock_check(_lock);					raw_spin_lock_init(&(_lock)->rlock);		} while (0)
static inline raw_spinlock_t *spinlock_check(spinlock_t *lock)
{
<span style="white-space:pre">	</span>return &lock->rlock;
}

static inline void spin_lock(spinlock_t *lock)
{
	raw_spin_lock(&lock->rlock);
}


in the SMP,the implement of  spin_lock is below:

#define raw_spin_lock(lock)<span style="white-space:pre">	</span>_raw_spin_lock(lock)


static inline void __raw_spin_lock(raw_spinlock_t *lock)
{
<span style="white-space:pre">	</span>preempt_disable();
<span style="white-space:pre">	</span>spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
<span style="white-space:pre">	</span>LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
}

preempt_diable 禁止内核抢占;

spin_acquire():

LOCK_CONTENDED:

LOCK_CONTENDED():是一个宏,如果不考虑CONFIG_LOCK_STAT(该宏是为了统计lock的操作),则:#define LOCK_CONTENDED(_lock, try, lock)   lock(_lock)

考虑CONFIG_LOCK_STAT这个宏则定义为如下:

#define LOCK_CONTENDED(_lock, try, lock) \
do { \
if (!try(_lock)) {\
lock_contended(&(_lock)->dep_map, _RET_IP_);\
lock(_lock); \
} \
lock_acquired(&(_lock)->dep_map, _RET_IP_);\
} while (0)

则实际上上述等于执行下面指令:do_raw_spin_lock()函数;

void do_raw_spin_lock(raw_spinlock_t *lock)
{
debug_spin_lock_before(lock);
if (unlikely(!arch_spin_trylock(&lock->raw_lock)))
__spin_lock_debug(lock);
debug_spin_lock_after(lock);
}

最后执行arch_spin_trylock()和平台有关 可从arch字段判断;

spinlock的实现就是检查lock->slock的值来判断锁的free or busy状态,所以不同的CPU对锁进行的decl或者incl指令必须是原子的,否则会出现多个CPU同时认为锁是free而进入临界区或者所有CPU都认为锁是busy而dead-lock的后果;在x86平台上,LOCK_PREFIX用前缀保证对lock->slock的原子性。

static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
{
	int tmp, new;

	asm volatile("movzwl %2, %0\n\t"
		     "cmpb %h0,%b0\n\t"
		     "leal 0x100(%" REG_PTR_MODE "0), %1\n\t"
		     "jne 1f\n\t"
		     LOCK_PREFIX "cmpxchgw %w1,%2\n\t"
		     "1:"
		     "sete %b1\n\t"
		     "movzbl %b1,%0\n\t"
		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
		     :
		     : "memory", "cc");

	return tmp;
}

@对SMP内核来说,LOCK_PREFIX为”\n\tlock” Lock是一个指令前缀,表示在接下来的一个指令内,LOCK信号被ASSERT,指令所访问的内存区域将为独占访问。具体实现或是BUS锁定,或是Cache一致性操作。可参考intel system program guide 8.1 另:这一实现是最新的实现,名为ticket实现,即每个希望获得锁的代码都会得到一张ticket,ticket按顺序增长,锁内部会维护一个当前使用锁的ticket号owner,和下一个使用锁的ticket号next,各一个字节。当锁处于释放状态时,owner=next,如果锁处于锁定状态,则next=owner+1。获得锁的时候,将next+1,释放锁的时候将owner+1。


对于(up)单处理器:

#define _raw_spin_lock(lock)__LOCK(lock)

/*
 * In the UP-nondebug case there‘s no real locking going on, so the
 * only thing we have to do is to keep the preempt counts and irq
 * flags straight, to suppress compiler warnings of unused lock
 * variables, and to add the proper checker annotations:
 */

#define __LOCK(lock) \
  do { preempt_disable(); __acquire(lock); (void)(lock); } while (0)

void)(lock)仅仅是为了防止编译器对lock的未使用报警;

spin_unlock 分析:

spin_unlockmacro releases a previously acquired spin lock;

static inline void spin_unlock(spinlock_t *lock)
{
raw_spin_unlock(&lock->rlock);
}
--in SMP---->static inline void __raw_spin_unlock(raw_spinlock_t *lock)
{
spin_release(&lock->dep_map, 1, _RET_IP_);
do_raw_spin_unlock(lock);
preempt_enable();
}-------->
----------->static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
{
arch_spin_unlock(&lock->raw_lock);和架构有关
__release(lock);//# define __release(x)__context__(x,-1)
}
    
对于单核处理器;

#define _raw_spin_unlock(lock)__UNLOCK(lock)

#define __UNLOCK(lock) \
  do { preempt_enable(); __release(lock); (void)(lock); } while 

read_copy_update(RCU):

rcu is a  synchronization  technology  designed  to  protect data structure  that are mostly  accessed  for reading by several CPUs ;rcu  allow many readers and writer  to proceed  concurrently ;confronted with  seqlock which allow one writer to proceed ; RCU is lock-free, that is, it uses nolock or counter shared by all CPUs;

but how to realize  it  without lock;the idea is occurs as follow :

1. Only data structures that are dynamically allocated and referenced by means of
pointers can be protected by RCU.
2.No kernel control path can sleep inside a critical region protected by RCU.

读者在访问被RCU保护的共享数据期间不能被阻塞,这是RCU机制得以实现的一个基本前提,也就说当读者在引用被RCU保护的共享数据期间,读者所在的CPU不能发生上下文切换,spinlock和rwlock都需要这样的前提;写者在访问被RCU保护的共享数据时不需要和读者竞争任何锁,只有在有多于一个写者的情况下需要获得某种锁以与其他写者同步,写者修改数据前首先拷贝一个被修改元素的副本,然后在副本上进行修改,修改完毕后它向垃圾回收器注册一个回调函数以便在适当的时机执行真正的修改操作。等待适当时机的这一时期称为grace period,而CPU发生了上下文切换称为经历一个quiescent state,grace period就是所有CPU都经历一次quiescent state所需要的等待的时间。垃圾收集器就是在grace period之后调用写者注册的回调函数来完成真正的数据修改或数据释放操作的。

static inline void __rcu_read_lock(void)
{
preempt_disable();
}


static inline void __rcu_read_unlock(void)
{
preempt_enable();
}

/**
 * struct rcu_head - callback structure for use with RCU
 * @next: next update requests in a list
 * @func: actual update function to call after the grace period.
 */
struct rcu_head {
struct rcu_head *next;
void (*func)(struct rcu_head *head);
};

转载自 ;RCU的使用;转载自https://www.ibm.com/developerworks/cn/linux/l-rcu/

1、只有增加和删除的链表操作;

在这种应用情况下,绝大部分是对链表的遍历,即读操作,而很少出现的写操作只有增加或删除链表项,并没有对链表项的修改操作,这种情况使用RCU非常容易,从rwlock转换成RCU非常自然。路由表的维护就是这种情况的典型应用,对路由表的操作,绝大部分是路由表查询,而对路由表的写操作也仅仅是增加或删除,因此使用RCU替换原来的rwlock顺理成章。系统调用审计也是这样的情况。

使用RWLOCK如下:

 static enum audit_state audit_filter_task(struct task_struct *tsk)
        {
                struct audit_entry *e;
                enum audit_state   state;
                read_lock(&auditsc_lock);
                /* Note: audit_netlink_sem held by caller. */
                list_for_each_entry(e, &audit_tsklist, list) {
                        if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
                                read_unlock(&auditsc_lock);
                                return state;
                        }
                }
                read_unlock(&auditsc_lock);
                return AUDIT_BUILD_CONTEXT;
        }
使用RCU则如下:

 static enum audit_state audit_filter_task(struct task_struct *tsk)
        {
                struct audit_entry *e;
                enum audit_state   state;
                rcu_read_lock();
                /* Note: audit_netlink_sem held by caller. */
                list_for_each_entry_rcu(e, &audit_tsklist, list) {
                        if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
                                rcu_read_unlock();
                                return state;
                        }
                }
                rcu_read_unlock();
                return AUDIT_BUILD_CONTEXT;
        }
这种转换非常直接,使用rcu_read_lock和rcu_read_unlock分别替换read_lock和read_unlock,链表遍历函数使用_rcu版本替换就可以了。

 static inline int audit_del_rule(struct audit_rule *rule,
                                         struct list_head *list)
        {
                struct audit_entry  *e;
                write_lock(&auditsc_lock);
                list_for_each_entry(e, list, list) {
                        if (!audit_compare_rule(rule, &e->rule)) {
                                list_del(&e->list);
                                write_unlock(&auditsc_lock);
                                return 0;
                        }
                }
                write_unlock(&auditsc_lock);
                return -EFAULT;         /* No matching rule */
        }
        static inline int audit_add_rule(struct audit_entry *entry,
                                         struct list_head *list)
        {
                write_lock(&auditsc_lock);
                if (entry->rule.flags & AUDIT_PREPEND) {
                        entry->rule.flags &= ~AUDIT_PREPEND;
                        list_add(&entry->list, list);
                } else {
                        list_add_tail(&entry->list, list);
                }
                write_unlock(&auditsc_lock);
                return 0;
        }
使用RCU后变为:

static inline int audit_del_rule(struct audit_rule *rule,
                                         struct list_head *list)
        {
                struct audit_entry  *e;
                /* Do not use the _rcu iterator here, since this is the only
                 * deletion routine. */
                list_for_each_entry(e, list, list) {
                        if (!audit_compare_rule(rule, &e->rule)) {
                                list_del_rcu(&e->list);
                                call_rcu(&e->rcu, audit_free_rule, e);
                                return 0;
                        }
                }
                return -EFAULT;         /* No matching rule */
        }
        static inline int audit_add_rule(struct audit_entry *entry,
                                         struct list_head *list)
        {
                if (entry->rule.flags & AUDIT_PREPEND) {
                        entry->rule.flags &= ~AUDIT_PREPEND;
                        list_add_rcu(&entry->list, list);
                } else {
                        list_add_tail_rcu(&entry->list, list);
                }
                return 0;
        }
对于链表删除操作,list_del替换为list_del_rcu和call_rcu,这是因为被删除的链表项可能还在被别的读者引用,所以不能立即删除,必须等到所有读者经历一个quiescent state才可以删除。另外,list_for_each_entry并没有被替换为list_for_each_entry_rcu,这是因为,只有一个写者在做链表删除操作,因此没有必要使用_rcu版本。

通常情况下,write_lock和write_unlock应当分别替换成spin_lock和spin_unlock,但是对于只是对链表进行增加和删除操作而且只有一个写者的写端,在使用了_rcu版本的链表操作API后,rwlock可以完全消除,不需要spinlock来同步读者的访问。对于上面的例子,由于已经有audit_netlink_sem被调用者保持,所以spinlock就没有必要了。

这种情况允许修改结果延后一定时间才可见,而且写者对链表仅仅做增加和删除操作,所以转换成使用RCU非常容易。

2.写端需要对链表条目进行修改操作;

如果写者需要对链表条目进行修改,那么就需要首先拷贝要修改的条目,然后修改条目的拷贝,等修改完毕后,再使用条目拷贝取代要修改的条目,要修改条目将被在经历一个grace period后安全删除。

static inline int audit_upd_rule(struct audit_rule *rule,
                                         struct list_head *list,
                                         __u32 newaction,
                                         __u32 newfield_count)
        {
                struct audit_entry  *e;
                struct audit_newentry *ne;
                write_lock(&auditsc_lock);
                /* Note: audit_netlink_sem held by caller. */
                list_for_each_entry(e, list, list) {
                        if (!audit_compare_rule(rule, &e->rule)) {
                                e->rule.action = newaction;
                                e->rule.file_count = newfield_count;
                                write_unlock(&auditsc_lock);
                                return 0;
                        }
                }
                write_unlock(&auditsc_lock);
                return -EFAULT;         /* No matching rule */
        }
使用RCU后;

static inline int audit_upd_rule(struct audit_rule *rule,
                                         struct list_head *list,
                                         __u32 newaction,
                                         __u32 newfield_count)
        {
                struct audit_entry  *e;
                struct audit_newentry *ne;
                list_for_each_entry(e, list, list) {
                        if (!audit_compare_rule(rule, &e->rule)) {
                                ne = kmalloc(sizeof(*entry), GFP_ATOMIC);
                                if (ne == NULL)
                                        return -ENOMEM;
                                audit_copy_rule(&ne->rule, &e->rule);
                                ne->rule.action = newaction;
                                ne->rule.file_count = newfield_count;
                                list_replace_rcu(e, ne);
                                call_rcu(&e->rcu, audit_free_rule, e);
                                return 0;
                        }
                }
                return -EFAULT;         /* No matching rule */
        }

3.修改操作立即可见:

读者能够容忍修改可以在一段时间后看到,也就说读者在修改后某一时间段内,仍然看到的是原来的数据。在很多情况下,读者不能容忍看到旧的数据,这种情况下,需要使用一些新措施,如System V IPC,它在每一个链表条目中增加了一个deleted字段,标记该字段是否删除,如果删除了,就设置为真,否则设置为假,当代码在遍历链表时,核对每一个条目的deleted字段,如果为真,就认为它是不存在的。
还是以系统调用审计代码为例,如果它不能容忍旧数据,那么,读端代码应该修改为:

static enum audit_state audit_filter_task(struct task_struct *tsk)
        {
                struct audit_entry *e;
                enum audit_state   state;
                rcu_read_lock();
                list_for_each_entry_rcu(e, &audit_tsklist, list) {
                        if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
                                spin_lock(&e->lock);
                                if (e->deleted) {
                                        spin_unlock(&e->lock);
                                        rcu_read_unlock();
                                        return AUDIT_BUILD_CONTEXT;
                                }
                                rcu_read_unlock();
                                return state;
                        }
                }
                rcu_read_unlock();
                return AUDIT_BUILD_CONTEXT;
        }
注意,对于这种情况,每一个链表条目都需要一个spinlock保护,因为删除操作将修改条目的deleted标志。此外,该函数如果搜索到条目,返回时应当保持该条目的锁,因为只有这样,才能看到新的修改的数据,否则,仍然可能看到就的数据。
 static inline int audit_del_rule(struct audit_rule *rule,
                                         struct list_head *list)
        {
                struct audit_entry  *e;
                /* Do not use the _rcu iterator here, since this is the only
                 * deletion routine. */
                list_for_each_entry(e, list, list) {
                        if (!audit_compare_rule(rule, &e->rule)) {
                                spin_lock(&e->lock);
                                list_del_rcu(&e->list);
                                e->deleted = 1;
                                spin_unlock(&e->lock);
                                call_rcu(&e->rcu, audit_free_rule, e);
                                return 0;
                        }
                }
                return -EFAULT;         /* No matching rule */
        }

SELinux的AVC、dcache和IPC等代码部分中,使用它来取代rwlock来获得更高的性能。但是,它也有缺点,延后的删除或释放将占用一些内存,尤其是对嵌入式系统,这可能是非常昂贵的内存开销。此外,写者的开销比较大,尤其是对于那些无法容忍旧数据的情况以及不只一个写者的情况,写者需要spinlock或其他的锁机制来与其他写者同步。

synchronization in Linux kernel

标签:

原文地址:http://blog.csdn.net/u012681083/article/details/51239924

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!