Discussion:
[Linuxptp-users] Failed synchronization
Daniel Le
2016-08-04 19:36:45 UTC
Permalink
Hello,

I have frequent synchronization issue on a PTP ordinary clock which has Linux kernel 3.18.12 and LinuxPTP v1.6. It runs in slave mode with software timestamping.

To force the problem to occur every time, I change the system clock to be approximately 5 minutes ahead of PTP Grandmaster clock. In the following, the (enhanced) log messages show ptp4l correctly steps the system clock with the observed offset, then transitions into SERVO_LOCKED state, however the master offset as seen by ptp4l keeps increasing afterwards. The slew and PI reset operations seem not to be able to keep up.

Any help is appreciated.
Daniel

ptp4l[137.667]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[137.667]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[137.864]: port 1: new foreign master 00b0ae.fffe.02d104-1
ptp4l[141.864]: selected best master clock 00b0ae.fffe.02d104
ptp4l[141.864]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[143.090]: port 1: minimum delay request interval 2^-7
ptp4l[143.801]: master offset 292990930284 s0 freq +0 path delay 255286
....................................
ptp4l[158.800]: master offset 292990192008 s0 freq +0 path delay 2970
ptp4l[159.799]: master offset 292989435868 s1 freq -93410 path delay 255430
ptp4l[159.800]: ***Before step*** system time = Thu Aug 4 18:59:45 2016 839875 micros
ptp4l[159.800]: clockadj_step clkid=0 offset=-292989435868
ptp4l[159.800]: ***After step*** system time = Thu Aug 4 18:54:51 2016 850548 micros
ptp4l[160.799]: master offset 292989436892 s2 freq +100000000 path delay 255430 <<<=========
ptp4l[160.799]: clockadj_set_freq
ptp4l[160.800]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[160.945]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[160.945]: pi_reset
....................................
ptp4l[161.700]: master offset 292989691148 s0 freq +100000000 path delay 2778
ptp4l[161.700]: port 1: SLAVE to UNCALIBRATED on SYNCHRONIZATION_FAULT
ptp4l[161.792]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[161.792]: pi_reset
....................................
ptp4l[162.600]: master offset 292989692174 s0 freq +100000000 path delay 2904
ptp4l[162.650]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[162.650]: pi_reset
....................................
ptp4l[163.500]: master offset 292989693962 s0 freq +100000000 path delay 2748
ptp4l[163.586]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[163.586]: pi_reset
....................................
ptp4l[164.400]: master offset 292989694906 s0 freq +100000000 path delay 2888
ptp4l[164.431]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[164.431]: pi_reset
....................................
ptp4l[165.300]: master offset -147127914636 s0 freq +100000000 path delay 146705873134
ptp4l[165.382]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[165.382]: pi_reset
....................................
ptp4l[166.200]: master offset -422042768 s0 freq +100000000 path delay 2870
ptp4l[166.231]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[166.231]: pi_reset
....................................
ptp4l[167.100]: master offset -422041762 s0 freq +100000000 path delay 2980
ptp4l[167.180]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[167.180]: pi_reset
....................................
ptp4l[167.999]: master offset -422040068 s0 freq +100000000 path delay 2886
ptp4l[168.031]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[168.031]: pi_reset
....................................
ptp4l[168.899]: master offset -422039004 s0 freq +100000000 path delay 2910
ptp4l[168.966]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[168.966]: pi_reset
....................................
ptp4l[169.799]: master offset -422037260 s0 freq +100000000 path delay 2770
ptp4l[170.001]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[170.001]: pi_reset
ptp4l[170.699]: master offset -585402570 s0 freq +100000000 path delay 51353036
ptp4l[171.599]: master offset -746676648 s0 freq +100000000 path delay 69771338
ptp4l[172.499]: master offset -887764240 s0 freq +100000000 path delay 68002674
....................................
Richard Cochran
2016-08-04 20:34:00 UTC
Permalink
Post by Daniel Le
I have frequent synchronization issue on a PTP ordinary clock which
has Linux kernel 3.18.12 and LinuxPTP v1.6. It runs in slave mode
with software timestamping.
You didn't tell us what hardware your are using: platform, NIC, driver?
Post by Daniel Le
To force the problem to occur every time, I change the system clock
to be approximately 5 minutes ahead of PTP Grandmaster clock. In the
following, the (enhanced) log messages show ptp4l correctly steps
the system clock with the observed offset, then transitions into
SERVO_LOCKED state, however the master offset as seen by ptp4l keeps
increasing afterwards. The slew and PI reset operations seem not to
be able to keep up.
After stepping the clock, the offset remains the same. Your system
clock or the MAC driver is somehow broken.

Also, the frequency estimation that occurs between UNCALIBRATED and
SLAVE runs off the scale. Is your local oscillator really that bad?

No idea how your computer could be so messed up.

Sorry,
Richard

------------------------------------------------------------------------------
Daniel Le
2016-08-04 22:13:35 UTC
Permalink
Hi Richard, please see follow-up comments and questions inline.

-----Original Message-----
From: Richard Cochran [mailto:***@gmail.com]
Sent: Thursday, August 04, 2016 4:34 PM
To: Daniel Le <***@exfo.com>
Cc: linuxptp-***@lists.sourceforge.net
Subject: Re: [Linuxptp-users] Failed synchronization
Post by Daniel Le
I have frequent synchronization issue on a PTP ordinary clock which
has Linux kernel 3.18.12 and LinuxPTP v1.6. It runs in slave mode with
software timestamping.
You didn't tell us what hardware your are using: platform, NIC, driver?
[DL] It's an embedded system with proprietary FPGA based NIC and driver.
Post by Daniel Le
To force the problem to occur every time, I change the system clock to
be approximately 5 minutes ahead of PTP Grandmaster clock. In the
following, the (enhanced) log messages show ptp4l correctly steps the
system clock with the observed offset, then transitions into
SERVO_LOCKED state, however the master offset as seen by ptp4l keeps
increasing afterwards. The slew and PI reset operations seem not to be
able to keep up.
After stepping the clock, the offset remains the same. Your system clock or the MAC driver is somehow broken.
[DL] Is tsproc_update_offset function a good place where to check t1, t2, t3, t4 values
in order to see which timestamp(s) may cause the offset to remain the same?
[DL] In this function, are master timestamps (t1, t4) TAI and slave timestamps (t2, t3) UTC?

Also, the frequency estimation that occurs between UNCALIBRATED and SLAVE runs off the scale. Is your local oscillator really that bad?
[DL] I can't explain why... but if the system is rebooted then synchronization works fine, while a restart of ptp4l doesn't make a difference.

No idea how your computer could be so messed up.

Sorry,
Richard

------------------------------------------------------------------------------
Richard Cochran
2016-08-05 07:40:42 UTC
Permalink
Post by Richard Cochran
You didn't tell us what hardware your are using: platform, NIC, driver?
[DL] It's an embedded system with proprietary FPGA based NIC and driver.
Sigh.

This is the same system you were asking us about back in April 2015.
If you have HW time stamping, then you should implement a proper PHC.
The fact that you are using SW time stamping in the application is a
sure sign that you have it all wrong.
Post by Richard Cochran
After stepping the clock, the offset remains the same. Your system clock or the MAC driver is somehow broken.
[DL] Is tsproc_update_offset function a good place where to check t1, t2, t3, t4 values
in order to see which timestamp(s) may cause the offset to remain the same?
Try looking in your driver.
Post by Richard Cochran
[DL] In this function, are master timestamps (t1, t4) TAI and slave timestamps (t2, t3) UTC?
Just how are we supposed to know?

You have some kind of weird custom system in which you abuse the whole
PHC subsystem. We can't help you with that.

Sorry,
Richard

------------------------------------------------------------------------------
Daniel Le
2016-08-10 20:48:59 UTC
Permalink
Hi Richard,

It's a new hardware platform and newer 3.x Linux kernel, however the integration approach is same.

I changed to rely on software timestamps instead and synchronization works fine.

When our FPGA team will make our NIC 1588 aware and implement a PHC, I will use hardware timestamping.

Regards,
Daniel

-----Original Message-----
From: Richard Cochran [mailto:***@gmail.com]
Sent: Friday, August 05, 2016 3:41 AM
To: Daniel Le <***@exfo.com>
Cc: linuxptp-***@lists.sourceforge.net
Subject: Re: [Linuxptp-users] Failed synchronization
Post by Richard Cochran
You didn't tell us what hardware your are using: platform, NIC, driver?
[DL] It's an embedded system with proprietary FPGA based NIC and driver.
Sigh.

This is the same system you were asking us about back in April 2015.
If you have HW time stamping, then you should implement a proper PHC.
The fact that you are using SW time stamping in the application is a sure sign that you have it all wrong.
Post by Richard Cochran
After stepping the clock, the offset remains the same. Your system clock or the MAC driver is somehow broken.
[DL] Is tsproc_update_offset function a good place where to check t1, t2, t3, t4 values
in order to see which timestamp(s) may cause the offset to remain the same?
Try looking in your driver.
Post by Richard Cochran
[DL] In this function, are master timestamps (t1, t4) TAI and slave timestamps (t2, t3) UTC?
Just how are we supposed to know?

You have some kind of weird custom system in which you abuse the whole PHC subsystem. We can't help you with that.

Sorry,
Richard

Continue reading on narkive:
Loading...