[Linuxptp-users] Need help debugging failed clock synchronization

Discussion:

John Hubbard

2016-03-15 23:14:32 UTC

Apologies if this has already been asked and answered. I tried to look
for solutions to my problem in the mailing list archive, but when I
click the list archive link on the mailman page, I get a sourceforge
page telling me Error 403 "Read access required".

I'm trying to configure a machine running CentOS 7 (3.10 kernel) with an
Intel 82574L NIC to use PTP as its time source. I was able to
successfully do this with another CentOS 7 machine (Intel i350 NIC) but
I'm having problems with this new system. In both cases the PTP Master
is a Spectracom SecureSync PTP Grand Master. I've followed Redhat's
directions [1] for configuring PTP. My ptp4l options are "-f
/etc/ptp4l.conf -i eno1 -A" and my phc2sys option are "-a -r -u 60". My
ptp4l.conf file is the CentOS 7 default and the same across both
system. I can supply that if you think it'll be useful. The master is
connected to the problem machine through a non-boundary switch;
specifically an HP-ProCurve 2910al-24g. The other machine is connected
through that same switch plus a non-boundary Cisco switch, and at least
two or three more switches of unknown manufacturers.

My log shows two repeating ptp4l log messages [2] with the master offset
counting slowly upwards. The path delay is kind of stable but always
negative. What does a negative path delay mean? The message about
clock jump: is that saying that the ptp master clock has jumped
forward/running fast, or is it referring to the system clock or a
hardware clock? Overall does anyone have any suggestions for what might
be wrong? FWIW [3] shows the ph2sys log messages.

Thanks in advance

[1]
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/ch-Configuring_PTP_Using_ptp4l.html

[2]
Mar 15 15:35:47 statler ptp4l[2628]: [2582.823] clockcheck: clock jumped
forward or running faster than expected!
Mar 15 15:37:37 statler ptp4l[2628]: [2693.041] master offset
993697857563 s0 freq +23999999 path delay -713598018

[3]
Mar 15 15:31:22 statler systemd[1]: Started Synchronize system clock or
PTP hardware clock (PHC).
Mar 15 15:31:33 statler phc2sys[773]: [2332.991] port
002590.fffe.a1f6a1-1 changed state
Mar 15 15:31:33 statler phc2sys[773]: [2332.991] reconfiguring after
port state change
Mar 15 15:31:33 statler phc2sys[773]: [2332.991] selecting
CLOCK_REALTIME for synchronization
Mar 15 15:31:33 statler phc2sys[773]: [2332.991] selecting eno1 as the
master clock
Mar 15 15:31:38 statler phc2sys[773]: [2333.991] port
002590.fffe.a1f6a1-1 changed state
Mar 15 15:31:38 statler phc2sys[773]: [2333.991] reconfiguring after
port state change
Mar 15 15:31:38 statler phc2sys[773]: [2333.991] master clock not ready,
waiting...
--
-john

To be or not to be, that is the question
2b || !2b
(0b10)*(0b1100010) || !(0b10)*(0b1100010)
0b11000100 || !0b11000100
0b11000100 || 0b00111011
0b11111111
255, that is the answer.

Ledda William EXT

2016-03-16 08:20:33 UTC

Permalink

Hello,
I remember that there were some discussion on the âclock jumped forward or running faster than expectedâ. It could be a problem in the driver. AFAIK Intel i350 uses igb driver, while 82574L e1000.

SoâŠ Which driver are you using (run âethtool âi <interface>â as root)? Have you tried to run ptp4l without phc2sys?

William

From: John Hubbard [mailto:***@noao.edu]
Sent: 16 March 2016 00:15
To: linuxptp-***@lists.sourceforge.net
Subject: [Linuxptp-users] Need help debugging failed clock synchronization

Apologies if this has already been asked and answered. I tried to look for solutions to my problem in the mailing list archive, but when I click the list archive link on the mailman page, I get a sourceforge page telling me Error 403 "Read access required".

I'm trying to configure a machine running CentOS 7 (3.10 kernel) with an Intel 82574L NIC to use PTP as its time source. I was able to successfully do this with another CentOS 7 machine (Intel i350 NIC) but I'm having problems with this new system. In both cases the PTP Master is a Spectracom SecureSync PTP Grand Master. I've followed Redhat's directions [1] for configuring PTP. My ptp4l options are "-f /etc/ptp4l.conf -i eno1 -A" and my phc2sys option are "-a -r -u 60". My ptp4l.conf file is the CentOS 7 default and the same across both system. I can supply that if you think it'll be useful. The master is connected to the problem machine through a non-boundary switch; specifically an HP-ProCurve 2910al-24g. The other machine is connected through that same switch plus a non-boundary Cisco switch, and at least two or three more switches of unknown manufacturers.

My log shows two repeating ptp4l log messages [2] with the master offset counting slowly upwards. The path delay is kind of stable but always negative. What does a negative path delay mean? The message about clock jump: is that saying that the ptp master clock has jumped forward/running fast, or is it referring to the system clock or a hardware clock? Overall does anyone have any suggestions for what might be wrong? FWIW [3] shows the ph2sys log messages.

Thanks in advance

[1] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/ch-Configuring_PTP_Using_ptp4l.html

[2]
Mar 15 15:35:47 statler ptp4l[2628]: [2582.823] clockcheck: clock jumped forward or running faster than expected!
Mar 15 15:37:37 statler ptp4l[2628]: [2693.041] master offset 993697857563 s0 freq +23999999 path delay -713598018

[3]
Mar 15 15:31:22 statler systemd[1]: Started Synchronize system clock or PTP hardware clock (PHC).
Mar 15 15:31:33 statler phc2sys[773]: [2332.991] port 002590.fffe.a1f6a1-1 changed state
Mar 15 15:31:33 statler phc2sys[773]: [2332.991] reconfiguring after port state change
Mar 15 15:31:33 statler phc2sys[773]: [2332.991] selecting CLOCK_REALTIME for synchronization
Mar 15 15:31:33 statler phc2sys[773]: [2332.991] selecting eno1 as the master clock
Mar 15 15:31:38 statler phc2sys[773]: [2333.991] port 002590.fffe.a1f6a1-1 changed state
Mar 15 15:31:38 statler phc2sys[773]: [2333.991] reconfiguring after port state change
Mar 15 15:31:38 statler phc2sys[773]: [2333.991] master clock not ready, waiting...
--
-john

To be or not to be, that is the question

2b || !2b

(0b10)*(0b1100010) || !(0b10)*(0b1100010)

0b11000100 || !0b11000100

0b11000100 || 0b00111011

0b11111111

255, that is the answer.

Richard Cochran

2016-03-16 10:53:09 UTC

Permalink

Post by Ledda William EXT
Hello,
I remember that there were some discussion on the “clock jumped
forward or running faster than expected”. It could be a problem in
the driver. AFAIK Intel i350 uses igb driver, while 82574L e1000.

The 82574 needs the e1000e driver.

Post by Ledda William EXT
So… Which driver are you using (run “ethtool –i <interface>” as
root)? Have you tried to run ptp4l without phc2sys?

Right, phc2sys depends on ptp4l working properly. So, to debug your
new HW, first run ptp4l all by itself.

Thanks,
Richard

Richard Cochran

2016-03-16 10:50:20 UTC

Permalink

Post by Ledda William EXT
Apologies if this has already been asked and answered. I tried to look for
solutions to my problem in the mailing list archive, but when I click the
list archive link on the mailman page, I get a sourceforge page telling me
Error 403 "Read access required".

Yes, SF is mostly broken. Please use gmane for the archives.

http://news.gmane.org/gmane.comp.linux.ptp.user

http://news.gmane.org/gmane.comp.linux.ptp.devel

Post by Ledda William EXT
I'm trying to configure a machine running CentOS 7 (3.10 kernel) with an
Intel 82574L NIC to use PTP as its time source.

There are two Linux kernel driver workarounds for that unlucky card:

5e7ff97004 v3.16-rc1 e1000e: 82574/82583 TimeSync errata for SYSTIM read
37b12910dd v4.3-rc1 e1000e: Fix tight loop implementation of systime read algorithm

You should try a newer kernel (4.3+) or use the Intel out of tree
drivers from SF.

Thanks,
Richard

John Hubbard

2016-03-16 15:38:45 UTC

Permalink

Post by Richard Cochran

Yes, SF is mostly broken. Please use gmane for the archives.
http://news.gmane.org/gmane.comp.linux.ptp.user
http://news.gmane.org/gmane.comp.linux.ptp.devel

Thanks for the hint. Looking through the archive, it looks like my
problem might be similar to Daniel Le's January thread "Master offsets
don't converge". However it doesn't look like he ever resolved things,
and it also looks like he was using SW time-stamping where as I believe
my NIC should be capable of HW time-stamping.

Post by Richard Cochran

Post by Ledda William EXT
I'm trying to configure a machine running CentOS 7 (3.10 kernel) with an
Intel 82574L NIC to use PTP as its time source.

5e7ff97004 v3.16-rc1 e1000e: 82574/82583 TimeSync errata for SYSTIM read
37b12910dd v4.3-rc1 e1000e: Fix tight loop implementation of systime read algorithm
You should try a newer kernel (4.3+) or use the Intel out of tree
drivers from SF.

Thanks for the suggestions. I followed the instructions at [1] and I'm
now running with a 4.5 kernel.

[***@statler:~]$ uname -a
Linux statler 4.5.0-1.el7.elrepo.x86_64 #1 SMP Mon Mar 14 10:24:58 EDT
2016 x86_64 x86_64 x86_64 GNU/Linux

I've disabled phc2sys for now. I tried restarting ptp4l and the log [2]
still shows the same clock jumped forward errors.

[2]
[***@statler:~]$ journalctl -u ptp4l -f
-- Logs begin at Wed 2016-03-16 07:48:05 MST. --
Mar 16 08:15:33 statler ptp4l[12591]: [242.851] port 0: INITIALIZING to
LISTENING on INITIALIZE
Mar 16 08:15:32 statler systemd[1]: Stopping Precision Time Protocol
(PTP) service...
Mar 16 08:15:33 statler systemd[1]: Started Precision Time Protocol
(PTP) service.
Mar 16 08:15:33 statler systemd[1]: Starting Precision Time Protocol
(PTP) service...
Mar 16 08:15:33 statler ptp4l[12591]: [243.204] port 1: new foreign
master 000cec.fffe.080c09-1
Mar 16 08:15:37 statler ptp4l[12591]: [247.209] selected best master
clock 000cec.fffe.080c09
Mar 16 08:15:37 statler ptp4l[12591]: [247.209] port 1: LISTENING to
UNCALIBRATED on RS_SLAVE
Mar 16 08:15:37 statler ptp4l[12591]: [247.279] port 1: minimum delay
request interval 2^4
Mar 16 08:15:39 statler ptp4l[12591]: [249.211] master offset
-16769399087 s0 freq +23999998 path delay -1116866908
Mar 16 08:15:40 statler ptp4l[12591]: [250.213] master offset
-13924642727 s1 freq +23999999 path delay -1116866908
Mar 16 08:15:41 statler ptp4l[12591]: [251.214] master offset 2750049109
s2 freq +23999999 path delay -1116866908
Mar 16 08:15:41 statler ptp4l[12591]: [251.214] port 1: UNCALIBRATED to
SLAVE on MASTER_CLOCK_SELECTED
Mar 16 08:15:42 statler ptp4l[12591]: [252.215] clockcheck: clock jumped
forward or running faster than expected!
Mar 16 08:15:42 statler ptp4l[12591]: [252.215] master offset 5502378494
s0 freq +23999999 path delay -1116866908

Messages continue with alternating "clockcheck: clock jumped" and
"master offset" messages. The freq is fixed, the master offset counts
slowly upwards, and the path delay remains negative with the occasional
small fluctuations.

[1]
http://linuxg.net/install-kernel-4-x-on-enterprise-linux-7-centos-7-and-rhel-7/
--
-john

To be or not to be, that is the question
2b || !2b
(0b10)*(0b1100010) || !(0b10)*(0b1100010)
0b11000100 || !0b11000100
0b11000100 || 0b00111011
0b11111111
255, that is the answer.

Ledda William EXT

2016-03-16 16:45:54 UTC

Permalink

What happens if you use SW time stamping instead of the HW one? Can you try compiling and installing manually the driver from Intel?

William

-----Original Message-----
From: John Hubbard [mailto:***@noao.edu]
Sent: 16 March 2016 16:39
To: linuxptp-***@lists.sourceforge.net
Subject: Re: [Linuxptp-users] Need help debugging failed clock synchronization

Post by Richard Cochran

Post by John Hubbard
Apologies if this has already been asked and answered. I tried to
look for solutions to my problem in the mailing list archive, but
when I click the list archive link on the mailman page, I get a
sourceforge page telling me Error 403 "Read access required".

Yes, SF is mostly broken. Please use gmane for the archives.
http://news.gmane.org/gmane.comp.linux.ptp.user
http://news.gmane.org/gmane.comp.linux.ptp.devel

Thanks for the hint. Looking through the archive, it looks like my problem might be similar to Daniel Le's January thread "Master offsets don't converge". However it doesn't look like he ever resolved things, and it also looks like he was using SW time-stamping where as I believe my NIC should be capable of HW time-stamping.

Post by Richard Cochran

Post by John Hubbard
I'm trying to configure a machine running CentOS 7 (3.10 kernel) with
an Intel 82574L NIC to use PTP as its time source.

Thanks for the suggestions. I followed the instructions at [1] and I'm now running with a 4.5 kernel.

[***@statler:~]$ uname -a
Linux statler 4.5.0-1.el7.elrepo.x86_64 #1 SMP Mon Mar 14 10:24:58 EDT
2016 x86_64 x86_64 x86_64 GNU/Linux

I've disabled phc2sys for now. I tried restarting ptp4l and the log [2] still shows the same clock jumped forward errors.

[2]
[***@statler:~]$ journalctl -u ptp4l -f
-- Logs begin at Wed 2016-03-16 07:48:05 MST. -- Mar 16 08:15:33 statler ptp4l[12591]: [242.851] port 0: INITIALIZING to LISTENING on INITIALIZE Mar 16 08:15:32 statler systemd[1]: Stopping Precision Time Protocol
(PTP) service...
Mar 16 08:15:33 statler systemd[1]: Started Precision Time Protocol
(PTP) service.
Mar 16 08:15:33 statler systemd[1]: Starting Precision Time Protocol
(PTP) service...
Mar 16 08:15:33 statler ptp4l[12591]: [243.204] port 1: new foreign master 000cec.fffe.080c09-1 Mar 16 08:15:37 statler ptp4l[12591]: [247.209] selected best master clock 000cec.fffe.080c09 Mar 16 08:15:37 statler ptp4l[12591]: [247.209] port 1: LISTENING to UNCALIBRATED on RS_SLAVE Mar 16 08:15:37 statler ptp4l[12591]: [247.279] port 1: minimum delay request interval 2^4 Mar 16 08:15:39 statler ptp4l[12591]: [249.211] master offset
-16769399087 s0 freq +23999998 path delay -1116866908 Mar 16 08:15:40 statler ptp4l[12591]: [250.213] master offset
-13924642727 s1 freq +23999999 path delay -1116866908 Mar 16 08:15:41 statler ptp4l[12591]: [251.214] master offset 2750049109
s2 freq +23999999 path delay -1116866908 Mar 16 08:15:41 statler ptp4l[12591]: [251.214] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED Mar 16 08:15:42 statler ptp4l[12591]: [252.215] clockcheck: clock jumped forward or running faster than expected!
Mar 16 08:15:42 statler ptp4l[12591]: [252.215] master offset 5502378494
s0 freq +23999999 path delay -1116866908

Messages continue with alternating "clockcheck: clock jumped" and "master offset" messages. The freq is fixed, the master offset counts slowly upwards, and the path delay remains negative with the occasional small fluctuations.

[1]
http://linuxg.net/install-kernel-4-x-on-enterprise-linux-7-centos-7-and-rhel-7/
--
-john

To be or not to be, that is the question
2b || !2b
(0b10)*(0b1100010) || !(0b10)*(0b1100010)
0b11000100 || !0b11000100
0b11000100 || 0b00111011
0b11111111
255, that is the answer.

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Linuxptp-users mailing list
Linuxptp-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

John Hubbard

2016-03-16 17:20:35 UTC

Permalink

Post by Ledda William EXT
What happens if you use SW time stamping instead of the HW one?

After changing the 'time_stamping' option in /etc/ptp4l.conf from
hardware to software and restarting ptp4l I now see much better
behavior. Below is the log output after giving it a little while to
settle. (This is still under the 4.5 kernel with the included e1000e
driver).

Mar 16 09:53:50 statler ptp4l[13014]: [6140.391] master offset -5072 s2
freq -21299 path delay 55357
Mar 16 09:53:51 statler ptp4l[13014]: [6141.393] master offset 7692 s2
freq -20015 path delay 55357
Mar 16 09:53:52 statler ptp4l[13014]: [6142.394] master offset -1163 s2
freq -20902 path delay 55357
Mar 16 09:53:53 statler ptp4l[13014]: [6143.396] master offset 5369 s2
freq -20243 path delay 55357
Mar 16 09:53:54 statler ptp4l[13014]: [6144.396] master offset -12270 s2
freq -22019 path delay 55357
Mar 16 09:53:55 statler ptp4l[13014]: [6145.398] master offset -18745 s2
freq -22685 path delay 55357
Mar 16 09:53:56 statler ptp4l[13014]: [6146.399] master offset 7707 s2
freq -20033 path delay 55357
Mar 16 09:53:57 statler ptp4l[13014]: [6147.401] master offset 7230 s2
freq -20073 path delay 55459
Mar 16 09:53:58 statler ptp4l[13014]: [6148.401] master offset 7093 s2
freq -20080 path delay 55459
Mar 16 09:53:59 statler ptp4l[13014]: [6149.403] master offset -1826 s2
freq -20973 path delay 55459
Mar 16 09:54:00 statler ptp4l[13014]: [6150.404] master offset 6597 s2
freq -20124 path delay 55459
Mar 16 09:54:01 statler ptp4l[13014]: [6151.405] master offset 5667 s2
freq -20212 path delay 55459
Mar 16 09:54:02 statler ptp4l[13014]: [6152.406] master offset -14483 s2
freq -22241 path delay 55459

Post by Ledda William EXT
Can you try compiling and installing manually the driver from Intel?

I believe that I did try the Intel driver but didn't see any success. I
found version 3.3.3 of the driver at [3], followed the instructions in
the readme. At the time I was running the 3.10.0-327.10.1 kernel. The
timestamp (see below) on e1000e.ko matches up with when I performed the
build, and the file size is way bigger (6M as compared to ~780K) for the
ko on the older 3.10 and the newer 4.5 kernels. I did an rmmod (which
hung my SSH session) I then rebooted the machine (which I assume loaded
the new driver). After having done all of that I saw the same clock
jumped forward messages, ever growing master offset, and negative path
delay. I then moved onto the new kernel.

-rw-r--r-- 1 root root 6.0M Mar 16 07:40
/usr/lib/modules/3.10.0-327.10.1.el7.x86_64/updates/drivers/net/ethernet/intel/e1000e/e1000e.ko
-rw-r--r--. 1 root root 381K Nov 19 15:52
/usr/lib/modules/3.10.0-327.el7.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
-rwxr--r-- 1 root root 377K Mar 14 08:37
/usr/lib/modules/4.5.0-1.el7.elrepo.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko

[3] https://downloadcenter.intel.com/download/15817

Post by Ledda William EXT
William
-----Original Message-----
Sent: 16 March 2016 16:39
Subject: Re: [Linuxptp-users] Need help debugging failed clock synchronization

Post by Richard Cochran

Yes, SF is mostly broken. Please use gmane for the archives.
http://news.gmane.org/gmane.comp.linux.ptp.user
http://news.gmane.org/gmane.comp.linux.ptp.devel

Thanks for the hint. Looking through the archive, it looks like my problem might be similar to Daniel Le's January thread "Master offsets don't converge". However it doesn't look like he ever resolved things, and it also looks like he was using SW time-stamping where as I believe my NIC should be capable of HW time-stamping.

Post by Richard Cochran

Post by John Hubbard
I'm trying to configure a machine running CentOS 7 (3.10 kernel) with
an Intel 82574L NIC to use PTP as its time source.

Thanks for the suggestions. I followed the instructions at [1] and I'm now running with a 4.5 kernel.
Linux statler 4.5.0-1.el7.elrepo.x86_64 #1 SMP Mon Mar 14 10:24:58 EDT
2016 x86_64 x86_64 x86_64 GNU/Linux
I've disabled phc2sys for now. I tried restarting ptp4l and the log [2] still shows the same clock jumped forward errors.
[2]
-- Logs begin at Wed 2016-03-16 07:48:05 MST. --
Mar 16 08:15:33 statler ptp4l[12591]: [242.851] port 0: INITIALIZING to LISTENING on INITIALIZE
Mar 16 08:15:32 statler systemd[1]: Stopping Precision Time Protocol (PTP) service...
Mar 16 08:15:33 statler systemd[1]: Started Precision Time Protocol (PTP) service.
Mar 16 08:15:33 statler systemd[1]: Starting Precision Time Protocol (PTP) service...
Mar 16 08:15:33 statler ptp4l[12591]: [243.204] port 1: new foreign master 000cec.fffe.080c09-1
Mar 16 08:15:37 statler ptp4l[12591]: [247.209] selected best master clock 000cec.fffe.080c09
Mar 16 08:15:37 statler ptp4l[12591]: [247.209] port 1: LISTENING to UNCALIBRATED on RS_SLAVE
Mar 16 08:15:37 statler ptp4l[12591]: [247.279] port 1: minimum delay request interval 2^4
Mar 16 08:15:39 statler ptp4l[12591]: [249.211] master offset -16769399087 s0 freq +23999998 path delay -1116866908
Mar 16 08:15:40 statler ptp4l[12591]: [250.213] master offset -13924642727 s1 freq +23999999 path delay -1116866908
Mar 16 08:15:41 statler ptp4l[12591]: [251.214] master offset 2750049109 s2 freq +23999999 path delay -1116866908
Mar 16 08:15:41 statler ptp4l[12591]: [251.214] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
Mar 16 08:15:42 statler ptp4l[12591]: [252.215] clockcheck: clock jumped forward or running faster than expected!
Mar 16 08:15:42 statler ptp4l[12591]: [252.215] master offset 5502378494 s0 freq +23999999 path delay -1116866908
Messages continue with alternating "clockcheck: clock jumped" and "master offset" messages. The freq is fixed, the master offset counts slowly upwards, and the path delay remains negative with the occasional small fluctuations.
[1]
http://linuxg.net/install-kernel-4-x-on-enterprise-linux-7-centos-7-and-rhel-7/

--
-john

To be or not to be, that is the question
2b || !2b
(0b10)*(0b1100010) || !(0b10)*(0b1100010)
0b11000100 || !0b11000100
0b11000100 || 0b00111011
0b11111111
255, that is the answer.

Richard Cochran

2016-03-16 19:54:03 UTC

Permalink

Post by John Hubbard
After changing the 'time_stamping' option in /etc/ptp4l.conf from
hardware to software and restarting ptp4l I now see much better
behavior.

Yes, but probably you are disappointed having to forego the HW
synchronization performance. At least this test shows that your card
most likely has a HW bug.

Post by John Hubbard
I believe that I did try the Intel driver but didn't see any success. I
found version 3.3.3 of the driver at [3], followed the instructions in
the readme. At the time I was running the 3.10.0-327.10.1 kernel. The
timestamp (see below) on e1000e.ko matches up with when I performed the
build, and the file size is way bigger (6M as compared to ~780K) for the
ko on the older 3.10 and the newer 4.5 kernels. I did an rmmod (which
hung my SSH session) I then rebooted the machine (which I assume loaded
the new driver).

I wouldn't assume that. Either do rmmod/insmod by hand (on the
console!) or simply rename or move the original driver before
rebooting.

Thanks,
Richard

John Hubbard

2016-03-16 20:45:00 UTC

Permalink

Post by Richard Cochran

Post by John Hubbard
After changing the 'time_stamping' option in /etc/ptp4l.conf from
hardware to software and restarting ptp4l I now see much better
behavior.

Yes, but probably you are disappointed having to forego the HW
synchronization performance. At least this test shows that your card
most likely has a HW bug.

If possible it would be really nice to get the HW time-stamping working
on this system. I can move to another system if needed but getting this
working would help me in the short term. (No expansion ports or I'd
just pick up another NIC. On a related note do you or anyone else on
the list know how well the Intel X540 (10Gb NIC using the ixgbe driver)
is supported WRT ptp4l?

Post by Richard Cochran

I wouldn't assume that. Either do rmmod/insmod by hand (on the
console!) or simply rename or move the original driver before
rebooting.

OK the machine has got three kernels installed. Here's the e1000e
driver version (as reported by modinfo) for each:

Kernel 3.10.0-327 e1000e version 3.2.5-k
Kernel 3.10.0-327.10.1 e1000e version 3.3.3-NAPI
Kernel 4.5.0-1 e1000e version 3.2.6-k

Under all three kernels with software time stamping things 'work' but
with more jitter than I'd like to see. With hardware time stamping
things don't work. Specifically I see clock jumped forward messages and
an ever increasing master offset.
--
-john

To be or not to be, that is the question
2b || !2b
(0b10)*(0b1100010) || !(0b10)*(0b1100010)
0b11000100 || !0b11000100
0b11000100 || 0b00111011
0b11111111
255, that is the answer.

Richard Cochran

2016-03-16 21:56:43 UTC

Permalink

If possible it would be really nice to get the HW time-stamping working on
this system. I can move to another system if needed but getting this
working would help me in the short term.

The best advice I know, would be to take the testptp program (from
linux/Documentation/ptp) and verify whether the HW clock is behaving
or not. For example, set time, get time, get time in loop, set a
frequency offset and compare interval with system time, etc.

Thanks,
Richard

Keller, Jacob E

2016-03-22 22:35:55 UTC

Permalink

Hi John,

Post by John Hubbard

Post by Richard Cochran

Post by John Hubbard
After changing the 'time_stamping' option in /etc/ptp4l.conf from
hardware to software and restarting ptp4l I now see much better
behavior.

Yes, but probably you are disappointed having to forego the HW
synchronization performance. At least this test shows that your
card
most likely has a HW bug.

If possible it would be really nice to get the HW time-stamping
working
on this system.  I can move to another system if needed but getting
this
working would help me in the short term.  (No expansion ports or I'd
just pick up another NIC.  On a related note do you or anyone else
on
the list know how well the Intel X540 (10Gb NIC using the ixgbe
driver)
is supported WRT ptp4l?

The X540 device should be supported WRT ptp4l, and as far as I know it
works quite well. I am sorry for the troubles the e1000e adapter is
causing. It is most likely a driver issue. I am not 100% sure who is
responsible for that driver now, but I will attempt to determine if the
latest errata have been released on SourceForge yet. (It can be slow
sometimes)

Post by John Hubbard

Post by Richard Cochran

Post by John Hubbard
I believe that I did try the Intel driver but didn't see any
success.  I
found version 3.3.3 of the driver at [3], followed the
instructions in
the readme.  At the time I was running the 3.10.0-327.10.1
kernel.  The
timestamp (see below) on e1000e.ko matches up with when I
performed the
build, and the file size is way bigger (6M as compared to ~780K) for the
ko on the older 3.10 and the newer 4.5 kernels.  I did an rmmod
(which
hung my SSH session) I then rebooted the machine (which I assume loaded
the new driver).

I wouldn't assume that. Either do rmmod/insmod by hand (on the
console!) or simply rename or move the original driver before
rebooting.

OK the machine has got three kernels installed.  Here's the e1000e
Kernel 3.10.0-327 e1000e version 3.2.5-k
Kernel 3.10.0-327.10.1 e1000e version 3.3.3-NAPI
Kernel 4.5.0-1 e1000e version 3.2.6-k
Under all three kernels with software time stamping things 'work'
but
with more jitter than I'd like to see.  With hardware time stamping
things don't work.  Specifically I see clock jumped forward messages
and
an ever increasing master offset.

As Richard suggested, I would use testphc program to debug if you have
a weird driver issue or not.

It is very likely an issue with the hardware for this part, as there
are several errata regarding the SYSTIME clock as Richard noted
earlier.

Regards,
Jake

John Hubbard

2016-03-23 17:54:47 UTC

Permalink

Post by Keller, Jacob E
Hi John,

Post by John Hubbard

Post by Richard Cochran

Post by John Hubbard
After changing the 'time_stamping' option in /etc/ptp4l.conf from
hardware to software and restarting ptp4l I now see much better
behavior.

Yes, but probably you are disappointed having to forego the HW
synchronization performance. At least this test shows that your card
most likely has a HW bug.

Thanks for the info about the x540. Please let me know if a newer
driver for the 82574L ends up on SF.
--
-john

To be or not to be, that is the question
2b || !2b
(0b10)*(0b1100010) || !(0b10)*(0b1100010)
0b11000100 || !0b11000100
0b11000100 || 0b00111011
0b11111111
255, that is the answer.

Keller, Jacob E

2016-03-23 18:59:13 UTC

Permalink

Hi John,

It looks like you should have the latest driver (3.3.3) already. If you could isolate the problem using testptp from the Documentation/ptp folder of the kernel tree, using the sourceforge e1000e driver, and show that it is having issues, then we can get that reported to the team that owns e1000e, and hopefully they can determine what needs to be fixed.

Thanks,
Jake

From: John Hubbard [mailto:***@noao.edu]
Sent: Wednesday, March 23, 2016 10:55 AM
To: Keller, Jacob E <***@intel.com>; ***@gmail.com
Cc: linuxptp-***@lists.sourceforge.net
Subject: Re: [Linuxptp-users] Need help debugging failed clock synchronization

On 03/22/2016 03:35 PM, Keller, Jacob E wrote:

Hi John,

On Wed, 2016-03-16 at 13:45 -0700, John Hubbard wrote:

On 03/16/2016 12:54 PM, Richard Cochran wrote:

On Wed, Mar 16, 2016 at 10:20:35AM -0700, John Hubbard wrote:

After changing the 'time_stamping' option in /etc/ptp4l.conf from

hardware to software and restarting ptp4l I now see much better

behavior.

Yes, but probably you are disappointed having to forego the HW

synchronization performance. At least this test shows that your

card

most likely has a HW bug.

If possible it would be really nice to get the HW time-stamping

working

on this system. I can move to another system if needed but getting

this

working would help me in the short term. (No expansion ports or I'd

just pick up another NIC. On a related note do you or anyone else

on

the list know how well the Intel X540 (10Gb NIC using the ixgbe

driver)

is supported WRT ptp4l?

The X540 device should be supported WRT ptp4l, and as far as I know it

works quite well. I am sorry for the troubles the e1000e adapter is

causing. It is most likely a driver issue. I am not 100% sure who is

responsible for that driver now, but I will attempt to determine if the

latest errata have been released on SourceForge yet. (It can be slow

sometimes)

Thanks for the info about the x540. Please let me know if a newer driver for the 82574L ends up on SF.
--
-john

To be or not to be, that is the question

2b || !2b

(0b10)*(0b1100010) || !(0b10)*(0b1100010)

0b11000100 || !0b11000100

0b11000100 || 0b00111011

0b11111111

255, that is the answer.