宝贝我爱你-一个诡异的时间问题追查

2012年04月02日 15:03:22

一个诡异的时间问题追查

问题引入：

问题是我们重装了一台电脑之后，发现apache的日志里突然增加了很多服务时间超长的请求，如下面两条：

** - - 4913229 11227 [29/Mar/2012:09:59:58 +0800] "GET*****

** - - 4793564 9846 [29/Mar/2012:09:59:58 +0800] "GET *****

请求服务时间都在4s以上！这绝对无法忍受啊！

分析过程：

但是，当我们：

* 手动访问这台服务器之后，发现基本上请求都是正常响应的；

* 从前端反向代理的日志来看，服务时间也是正常的；

那这样推断，真相只有一个，那就是apache的日志时间记录有问题！

什么问题呢?不清楚，但是apache的时间记录不外乎是在收到请求时记个时间，请求处理完成之后再记一个时间，既然是时间的问题，那么就看看系统时间吧，从最简单的 date 开始，多执行几次：

apache2 # date

Wed Mar 28 17:25:08 CST 2012

apache2 # date

Wed Mar 28 17:25:04 CST 2012 时间减少了？

apache2 # date

Wed Mar 28 17:25:06 CST 2012

apache2 # date

Wed Mar 28 17:25:12 CST 2012

apache2 # date

Wed Mar 28 17:25:13 CST 2012

apache2 # date

Wed Mar 28 17:25:09 CST 2012 时间减少了？

apache2 # date

Wed Mar 28 17:25:09 CST 2012

apache2 # date

Wed Mar 28 17:25:15 CST 2012

那么第一个怀疑就是系统里有进程在自己不断的调时间，比如ntpdate, ntpd，那么我们首先把 ntpd给停掉；然后系统也没有任何地方会调用ntpdate；

继续观察，发现停掉ntpd后，现象依旧！

还有什么可能呢？ date程序是从内核里面读取时间还是说需要读取硬件时间？到底是怎么计算的呢？

请教下同事们，据说 sys_time 是读的kernel里的时间，而 sys_gettimeofday是需要读硬件时钟的。

那么就测试一把吧：

先写个程序调用 time(), 然后打印结果，结果是这样的：

1332986690

1332986692

1332986693

1332986694

1332986695

1332986696

1332986697

1332986698

1332986699

确实没有问题了！看来有可能是硬件问题了，再看看clock_gettime 的调用：

{

struct timespec ts;

int ret = 0;

ret = clock_gettime(CLOCK_REALTIME, ts);

printf("ret: %d, time: %d %d", ts.tv_sec, ts.tv_ns

}

编译: gcc -g -o clock_gettime getdate.c -lrt

~ # ./clock_gettime

ret: 1332991087, time: 594076000 -1

ret: 1332991088, time: 379009000 -1

ret: 1332991089, time: 68561000 -1

ret: 1332991089, time: 714013000 -1

ret: 1332991086, time: 511250000 -1 时间在跳

ret: 1332991091, time: 945626000 -1

ret: 1332991092, time: 650479000 -1

ret: 1332991088, time: 734780000 -1

ret: 1332991094, time: 506114000 -1时间在跳

看来果然有可能是硬件问题！

那么，检查一下，直接调用一下 hwclock --show ，看看是不是可以验证这个问题：

~ # hwclock --show

Thu 29 Mar 2012 10:22:21 AM CST -0.948923 seconds

~ # hwclock --show

Thu 29 Mar 2012 10:22:22 AM CST -0.188950 seconds

~ # hwclock --show

Thu 29 Mar 2012 10:22:23 AM CST -0.244766 seconds

~ # hwclock --show

Thu 29 Mar 2012 10:22:24 AM CST -0.336868 seconds

~ # hwclock --show

Thu 29 Mar 2012 10:22:25 AM CST 4.237159 seconds

~ # hwclock --show

Thu 29 Mar 2012 10:22:26 AM CST 4.238672 seconds

~ # hwclock --show

Thu 29 Mar 2012 10:22:27 AM CST -0.379418 seconds

时间居然都是正确的！显然问题没有那么简单。。。那到底是怎么回事？？还是从date开始查起吧：

date是一个glibc里面的程序，那它的时间源是从哪里来的？查看一下glibc的源码，它的时间从这里来：

// glibc coreutils gettime.c

/* Get the system time into *TS. */

void

gettime (struct timespec *ts)

{

#if HAVE_NANOTIME

nanotime (ts);

#else

# if defined CLOCK_REALTIME && HAVE_CLOCK_GETTIME

if (clock_gettime (CLOCK_REALTIME, ts) == 0)

return;

# endif

# if HAVE_GETTIMEOFDAY

{

struct timeval tv;

gettimeofday (&tv, NULL);

ts->tv_sec = tv.tv_sec;

ts->tv_nsec = tv.tv_usec * 1000;

}

# else

ts->tv_sec = time (NULL);

ts->tv_nsec = 0;

# endif

#endif

}

按照 nanotime, clock_gettime, gettimeofday的顺序去读取，我们这个系统中就是 clock_gettime，从上面的小程序中也可以看到clock_gettime的时间是有问题的，那么clock_gettime的时间源是怎么来的呢？

// linux kernel 2.6.18 arch/x86_64/kernel/time.c

void do_gettimeofday(struct timeval *tv)

{

unsigned long seq, t;

unsigned int sec, usec;

do {

seq = read_seqbegin(&xtime_lock);

sec = xtime.tv_sec;

usec = xtime.tv_nsec / NSEC_PER_USEC;

t = (jiffies - wall_jiffies) * USEC_PER_TICK +

do_gettimeoffset();

usec += t;

} while (read_seqretry(&xtime_lock, seq));

tv->tv_sec = sec + usec / USEC_PER_SEC;

tv->tv_usec = usec % USEC_PER_SEC;

}

反正就是一通计算，其中比较重要的是 do_gettimeoffset，这个函数的目的是计算上一次更新 xtime到当前执行这一段时间的偏移量，然后用这个偏移量去加到 xtime上去，就是当前的时间（因为xtime的精度不够，clock_gettime的精度要求更高，所以必须计算一个偏移量）。

那么，这个 do_gettimeoffset又是从哪来的？同样的文件里面有：

static inline unsigned int do_gettimeoffset_tsc(void)

{

unsigned long t;

unsigned long x;

t = get_cycles_sync();

if (t < vxtime.last_tsc)

t = vxtime.last_tsc; /* hack */

x = ((t - vxtime.last_tsc) * vxtime.tsc_quot) >> US_SCALE;

return x;

}

static inline unsigned int do_gettimeoffset_hpet(void)

{

/* cap counter read to one tick to avoid inconsistencies */

unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last;

return (min(counter,hpet_tick) * vxtime.quot) >> US_SCALE;

}

unsigned int (*do_gettimeoffset)(void) = do_gettimeoffset_tsc;

也就是说，go_gettimeoffset 是从 tsc寄存器从拿到的，这又是神马玩意呢？参考一下下面的几篇文章【1】【2】【3】，基本就明白了：

tsc是一个cpu内的寄存器，用来累加从系统启动到目前，所有的时钟周期数，全称是 Timestamp Counter；

而且，最重要的是我们发现了一个重要的线索，那就是：

在多核cpu的架构中，tsc寄存器是很有可能不一致的，特别是在intel的比较老的多核CPU中！坑爹....

很多文章提到过这个问题，比如【1】【2】【4】，显然如果不同cpu读到的值不同的话，多次执行，显然时间是不一致的。

看起来就是这个问题了，那么就来验证一把：

#include<stdlib.h>

#include<stdio.h>

#include<sys/types.h>

#include<sys/sysinfo.h>

#include<unistd.h>

#include<time.h>

#define __USE_GNU

#include<sched.h>

#include<ctype.h>

#include<string.h>

int main(int argc, char **argv){

cpu_set_t cs;

CPU_ZERO(&cs);

CPU_SET(0, &cs);

sched_setaffinity(0, sizeof(cpu_set_t), &cs);

struct timespec ts;

int ret = 0;

ret = clock_gettime(CLOCK_REALTIME, &ts);

printf("ret: %d, time: %lld %lld", ret, ts.tv_sec, ts.tv_nsec);

}

gcc -g -o cpu0_clock_gettime cpu_getdate.c -lrt

ret: 0, time: 1332992815 537481000

ret: 0, time: 1332992816 506704000

ret: 0, time: 1332992817 9910000

ret: 0, time: 1332992817 480431000

ret: 0, time: 1332992817 936965000

ret: 0, time: 1332992818 416634000

ret: 0, time: 1332992818 850319000

ret: 0, time: 1332992819 305075000

ret: 0, time: 1332992819 729108000

ret: 0, time: 1332992820 164726000

ret: 0, time: 1332992820 577573000

ret: 0, time: 1332992820 985025000

ret: 0, time: 1332992821 394070000

ret: 0, time: 1332992821 816708000

ret: 0, time: 1332992822 216811000

ret: 0, time: 1332992822 616901000

ret: 0, time: 1332992823 18564000w

ret: 0, time: 1332992823 727585000

通过把程序的调度固定在某一个具体的cpu中，那么clock_gettime的结果也是正确的！

经过几次测试，很容易找到两个cpu时间不一致的情况，具体这个例子来说就是 cpu0和cpu5，把上面的程序修改一下，可以看到结果基本上每次都会跳一把。

问题解决：

那么，tsc在多核cpu中的值不同，应该怎么解决呢？如【4】所示，在高版本的linux内核中，有人提交了补丁，可以在内核启动时检查并且同步一把，那么升级一下内核到 2.6.38。

重新启动到新内核之后，问题解决！

总结一下：

1. apache的时间统计方式，没具体看实现，但是显然前后的两个时间可能是在不同的cpu核上计算出来的。

2. tsc的值，并不能保证是同步的，特别是在某些Intel cpu的情况下；但是新的内核会做一些检查和resync的工作；

3. time()调用直接读取内核的xtime值，不会触及硬件；但是 gettimeofday()， clock_gettime() 在 x86_64下都会读取tsc值；

4. hwclock读的硬件时钟显然不是 tsc，应该是实时时钟RTC；

【1】 http://en.wikipedia.org/wiki/Time_Stamp_Counter

【2】 http://juliusdavies.ca/posix_clocks/clock_realtime_linux_faq.html

【4】http://lwn.net/Articles/211051/ x86: unify/rewrite SMP TSC sync code

« 2026年4月 »
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30