Discussion:
[Linuxptp-users] Master offsets don't converge
Daniel Le
2016-01-15 16:19:23 UTC
Permalink
Hello,

My ptp4l version 1.4 in software timestamping mode works fine with a Linux kernel 2.6.35, however when I switch to the kernel 3.18.12 (and new Ethernet driver), I see the master offsets are huge and never converge. Any pointer to debug this is much appreciated.

/ #ptp4l -f /etc/ptp4l.conf
ptp4l[250704.924]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[250704.924]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[250705.355]: port 1: new foreign master 00b0ae.fffe.02d103-1
ptp4l[250708.955]: selected best master clock 00b0ae.fffe.02d103
ptp4l[250708.955]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[250709.856]: port 1: minimum delay request interval 2^-7
ptp4l[250710.698]: master offset1 -6601404463576 s0 freq +100000000 path delay 220834
ptp4l[250711.598]: master offset1 -6601404940762 s0 freq +100000000 path delay 224676
ptp4l[250712.498]: master offset1 -6601405412898 s0 freq +100000000 path delay 223500
ptp4l[250713.398]: master offset1 -6601405890510 s0 freq +100000000 path delay 227796
ptp4l[250714.298]: master offset1 -6601406361480 s0 freq +100000000 path delay 225458
ptp4l[250715.198]: master offset1 -6601406835542 s0 freq +100000000 path delay 226236
ptp4l[250716.098]: master offset1 -6601407311244 s0 freq +100000000 path delay 228594
ptp4l[250716.998]: master offset1 -6601407784176 s0 freq +100000000 path delay 228218
ptp4l[250717.898]: master offset1 -6601408255930 s0 freq +100000000 path delay 226660
ptp4l[250718.798]: master offset1 -6601408732050 s0 freq +100000000 path delay 229468
ptp4l[250719.698]: master offset1 -6601409205854 s0 freq +100000000 path delay 229956
ptp4l[250720.598]: master offset1 -6601409673392 s0 freq +100000000 path delay 224186
ptp4l[250721.497]: master offset1 -6601410151822 s0 freq +100000000 path delay 229300
ptp4l[250722.397]: master offset1 -6601410625760 s0 freq +100000000 path delay 229898
ptp4l[250723.297]: master offset1 -6601411100194 s0 freq +100000000 path delay 231052
ptp4l[250724.197]: master offset1 -6601411564234 s0 freq +100000000 path delay 221780
ptp4l[250725.097]: master offset1 -6601412044146 s1 freq +99573391 path delay 228348
ptp4l[250725.208]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250725.319]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250725.426]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250725.536]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250725.638]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250725.741]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250725.853]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250725.965]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250725.998]: master offset1 -2711305334 s0 freq +99573391 path delay 228348
ptp4l[250726.068]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250726.176]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250726.280]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250726.387]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250726.491]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250726.605]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250726.713]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250726.814]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250726.898]: master offset1 -2711772472 s0 freq +99573391 path delay 222142
ptp4l[250726.916]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250727.020]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[250727.120]: clockcheck: clock jumped forward or running faster than expected!

Thanks,
Daniel
Richard Cochran
2016-01-15 18:03:42 UTC
Permalink
Post by Daniel Le
My ptp4l version 1.4 in software timestamping mode works fine with a
Linux kernel 2.6.35, however when I switch to the kernel 3.18.12
(and new Ethernet driver), I see the master offsets are huge and
never converge. Any pointer to debug this is much appreciated.
1. Start with vanilla 1.4 and verfiy correct operation.
2. Add your first (next) minimal change.
3. Correct operation? If yes, goto step 2
4. You found the bug.

If you have lots of changes, then use git bisect.

HTH,
Richard
Keller, Jacob E
2016-01-15 18:27:45 UTC
Permalink
Post by Daniel Le
Hello,
 
My ptp4l version 1.4 in software timestamping mode works fine with a
Linux kernel 2.6.35, however when I switch to the kernel 3.18.12 (and
new Ethernet driver), I see the master offsets are huge and never
converge. Any pointer to debug this is much appreciated.
 
You say this is software timestamping? What's your configuration? I
would suspect such a large kernel change to possibly be result of a
driver bug, but this wouldn't be the case if you're using pure software
timestamping. Can you copy your ptp4l.conf file?


Are you using only unmodified upstream versions? If you're using any
modifications, I would bisect through those, confirming that the
vanilla versions work just fine.

Regards,
Jake
Post by Daniel Le
/ #ptp4l -f /etc/ptp4l.conf
ptp4l[250704.924]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[250704.924]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[250705.355]: port 1: new foreign master 00b0ae.fffe.02d103-1
ptp4l[250708.955]: selected best master clock 00b0ae.fffe.02d103
ptp4l[250708.955]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[250709.856]: port 1: minimum delay request interval 2^-7
ptp4l[250710.698]: master offset1 -6601404463576 s0 freq +100000000 path delay    220834
ptp4l[250711.598]: master offset1 -6601404940762 s0 freq +100000000 path delay    224676
ptp4l[250712.498]: master offset1 -6601405412898 s0 freq +100000000 path delay    223500
This smells of a driver bug. Notice how the frequency shift is maxed,
and yet the clock is still drifting farther apart. This either means
that the real clock drift is *over* 10%, (which is very unlikely), or
there is a bug in the frequency tuning. But if you really are using
software timestamps, this doesn't make sense.


Again, if you're not using vanilla LinuxPTP 1.4, I would retry with
that and confirm the behavior. If you are using vanilla LinuxPTP, I
would confirm that you are infact actually using software only
timestamping.

Regards,
Jake
Daniel Le
2016-01-15 18:56:05 UTC
Permalink
Below is my PTP configuration. It doesn't run in 'pure' software timestamping, i.e. although ptp4l is configured for software timestamping, the packet timestamps are provided by the FPGA hardware on a NIC, which gets the host system time every 1 second and steps/slews to it. There may be a synchronization issue between the system clock that is maintained by ptp4l and the FPGA based hardware clock. I am guessing that the large offsets are due to wrong timestamps and not sure how best to debug it...

In 2.6.35 kernel, clock_adjtime() is defined as adjtimex() by #ifndef HAVE_CLOCK_ADJTIME, and in 3.18.12 clock_adjtime() is used as is, but that seems not to be the issue.

Thanks.

/ #cat /etc/ptp4l.conf
[global]
domainNumber 0
slaveOnly 1
priority1 128
priority2 128
clockClass 248
clockAccuracy 254
offsetScaledLogVariance 65535
freq_est_interval 1
time_stamping software
tx_timestamp_timeout 1
logging_level 6
verbose 1
use_syslog 0
summary_interval 0
[eth1]
delay_mechanism E2E
network_transport UDPv4
delayAsymmetry 0
logAnnounceInterval 1
logSyncInterval 0
logMinDelayReqInterval 0
logMinPdelayReqInterval 0
announceReceiptTimeout 3
syncReceiptTimeout 0
delay_filter moving_average
delay_filter_length 10
path_trace_enabled 0
fault_reset_interval 4


-----Original Message-----
From: Keller, Jacob E [mailto:***@intel.com]
Sent: Friday, January 15, 2016 1:28 PM
To: Daniel Le <***@exfo.com>; linuxptp-***@lists.sourceforge.net
Subject: Re: [Linuxptp-users] Master offsets don't converge
Post by Daniel Le
Hello,
 
My ptp4l version 1.4 in software timestamping mode works fine with a
Linux kernel 2.6.35, however when I switch to the kernel 3.18.12 (and
new Ethernet driver), I see the master offsets are huge and never
converge. Any pointer to debug this is much appreciated.
 
You say this is software timestamping? What's your configuration? I would suspect such a large kernel change to possibly be result of a driver bug, but this wouldn't be the case if you're using pure software timestamping. Can you copy your ptp4l.conf file?


Are you using only unmodified upstream versions? If you're using any modifications, I would bisect through those, confirming that the vanilla versions work just fine.

Regards,
Jake
Post by Daniel Le
/ #ptp4l -f /etc/ptp4l.conf
ptp4l[250704.924]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[250704.924]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[250705.355]: port 1: new foreign master 00b0ae.fffe.02d103-1
ptp4l[250708.955]: selected best master clock 00b0ae.fffe.02d103
ptp4l[250708.955]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[250709.856]: port 1: minimum delay request interval 2^-7
ptp4l[250710.698]: master offset1 -6601404463576 s0 freq +100000000 path delay    220834
ptp4l[250711.598]: master offset1 -6601404940762 s0 freq +100000000 path delay    224676
ptp4l[250712.498]: master offset1 -6601405412898 s0 freq +100000000 path delay    223500
This smells of a driver bug. Notice how the frequency shift is maxed, and yet the clock is still drifting farther apart. This either means that the real clock drift is *over* 10%, (which is very unlikely), or there is a bug in the frequency tuning. But if you really are using software timestamps, this doesn't make sense.


Again, if you're not using vanilla LinuxPTP 1.4, I would retry with that and confirm the behavior. If you are using vanilla LinuxPTP, I would confirm that you are infact actually using software only timestamping.

Regards,
Jake
Keller, Jacob E
2016-01-15 19:29:24 UTC
Permalink
It is almost certainly a result of the driver doing the mixed
hardware/software timestamps.

I suspect that the software clock is being slewed, but somehow your
timestamps are not being updated fast enough so these hardware
timestamps are no longer matching against the system clock.

Out of curiosity, why not expose the hardware clock directly as a PHC?

Regards,
Jake
Post by Daniel Le
Below is my PTP configuration. It doesn't run in 'pure' software
timestamping, i.e. although ptp4l is configured for software
timestamping, the packet timestamps are provided by the FPGA hardware
on a NIC, which gets the host system time every 1 second and
steps/slews to it. There may be a synchronization issue between the
system clock that is maintained by ptp4l and the FPGA based hardware
clock. I am guessing that the large offsets are due to wrong
timestamps and not sure how best to debug it...
In 2.6.35 kernel, clock_adjtime() is defined as adjtimex() by #ifndef
HAVE_CLOCK_ADJTIME, and in 3.18.12 clock_adjtime() is used as is, but
that seems not to be the issue.
Thanks.
/ #cat /etc/ptp4l.conf 
 [global]
domainNumber                     0
slaveOnly                                 1
priority1                                    128
priority2                                    128
clockClass                                248
clockAccuracy                        254
offsetScaledLogVariance  65535
freq_est_interval                1
time_stamping                     software
tx_timestamp_timeout    1
logging_level                        6
verbose                                  1
use_syslog                            0
summary_interval             0
[eth1]
delay_mechanism                   E2E
network_transport                 UDPv4
delayAsymmetry                     0
logAnnounceInterval             1
logSyncInterval                         0
logMinDelayReqInterval       0
logMinPdelayReqInterval    0
announceReceiptTimeout   3
syncReceiptTimeout              0
delay_filter                                moving_average
delay_filter_length                10
path_trace_enabled              0
fault_reset_interval               4
-----Original Message-----
Sent: Friday, January 15, 2016 1:28 PM
net
Subject: Re: [Linuxptp-users] Master offsets don't converge
Post by Daniel Le
Hello,
 
My ptp4l version 1.4 in software timestamping mode works fine with

Linux kernel 2.6.35, however when I switch to the kernel 3.18.12
(and 
new Ethernet driver), I see the master offsets are huge and never 
converge. Any pointer to debug this is much appreciated.
 
You say this is software timestamping? What's your configuration? I
would suspect such a large kernel change to possibly be result of a
driver bug, but this wouldn't be the case if you're using pure
software timestamping. Can you copy your ptp4l.conf file?
Are you using only unmodified upstream versions? If you're using any
modifications, I would bisect through those, confirming that the
vanilla versions work just fine.
Regards,
Jake
Post by Daniel Le
/ #ptp4l -f /etc/ptp4l.conf
ptp4l[250704.924]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[250704.924]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[250705.355]: port 1: new foreign master 00b0ae.fffe.02d103-1
ptp4l[250708.955]: selected best master clock 00b0ae.fffe.02d103
ptp4l[250708.955]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[250709.856]: port 1: minimum delay request interval 2^-7
ptp4l[250710.698]: master offset1 -6601404463576 s0 freq
+100000000 
path delay    220834
ptp4l[250711.598]: master offset1 -6601404940762 s0 freq
+100000000 
path delay    224676
ptp4l[250712.498]: master offset1 -6601405412898 s0 freq
+100000000 
path delay    223500
This smells of a driver bug. Notice how the frequency shift is maxed,
and yet the clock is still drifting farther apart. This either means
that the real clock drift is *over* 10%, (which is very unlikely), or
there is a bug in the frequency tuning. But if you really are using
software timestamps, this doesn't make sense.
Again, if you're not using vanilla LinuxPTP 1.4, I would retry with
that and confirm the behavior. If you are using vanilla LinuxPTP, I
would confirm that you are infact actually using software only
timestamping.
Regards,
Jake
Daniel Le
2016-01-15 19:40:28 UTC
Permalink
Hi Jake,

Because my hardware NIC is not 1588 capable and that would require FPGA change, however I'm hoping to get better timestamp accuracy from the hardware clock that is tuned to the host system clock in software timestamping mode (which I understand it's in the reverse direction of LinuxPTP hardware timestamping configuration where the system clock synchronizes to the PHC clock instead).

Daniel

-----Original Message-----
From: Keller, Jacob E [mailto:***@intel.com]
Sent: Friday, January 15, 2016 2:29 PM
To: Daniel Le <***@exfo.com>; linuxptp-***@lists.sourceforge.net
Subject: Re: [Linuxptp-users] Master offsets don't converge

It is almost certainly a result of the driver doing the mixed hardware/software timestamps.

I suspect that the software clock is being slewed, but somehow your timestamps are not being updated fast enough so these hardware timestamps are no longer matching against the system clock.

Out of curiosity, why not expose the hardware clock directly as a PHC?

Regards,
Jake
Post by Daniel Le
Below is my PTP configuration. It doesn't run in 'pure' software
timestamping, i.e. although ptp4l is configured for software
timestamping, the packet timestamps are provided by the FPGA hardware
on a NIC, which gets the host system time every 1 second and
steps/slews to it. There may be a synchronization issue between the
system clock that is maintained by ptp4l and the FPGA based hardware
clock. I am guessing that the large offsets are due to wrong
timestamps and not sure how best to debug it...
In 2.6.35 kernel, clock_adjtime() is defined as adjtimex() by #ifndef
HAVE_CLOCK_ADJTIME, and in 3.18.12 clock_adjtime() is used as is, but
that seems not to be the issue.
Thanks.
/ #cat /etc/ptp4l.conf
 [global]
domainNumber                     0
slaveOnly                                 1
priority1                                    128
priority2                                    128 clockClass                                
248 clockAccuracy                        254 offsetScaledLogVariance  
65535 freq_est_interval                1 time_stamping                     
software tx_timestamp_timeout    1 logging_level                        
6 verbose                                  1 use_syslog                            
0 summary_interval             0 [eth1] delay_mechanism                   
E2E network_transport                 UDPv4 delayAsymmetry                     
0 logAnnounceInterval             1 logSyncInterval                         
0 logMinDelayReqInterval       0 logMinPdelayReqInterval    0
announceReceiptTimeout   3 syncReceiptTimeout              0
delay_filter                                moving_average
delay_filter_length                10 path_trace_enabled              
0 fault_reset_interval               4
-----Original Message-----
Sent: Friday, January 15, 2016 1:28 PM
net
Subject: Re: [Linuxptp-users] Master offsets don't converge
Post by Daniel Le
Hello,
 
My ptp4l version 1.4 in software timestamping mode works fine with a
Linux kernel 2.6.35, however when I switch to the kernel 3.18.12
(and new Ethernet driver), I see the master offsets are huge and
never converge. Any pointer to debug this is much appreciated.
 
You say this is software timestamping? What's your configuration? I
would suspect such a large kernel change to possibly be result of a
driver bug, but this wouldn't be the case if you're using pure
software timestamping. Can you copy your ptp4l.conf file?
Are you using only unmodified upstream versions? If you're using any
modifications, I would bisect through those, confirming that the
vanilla versions work just fine.
Regards,
Jake
Post by Daniel Le
/ #ptp4l -f /etc/ptp4l.conf
ptp4l[250704.924]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[250704.924]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[250705.355]: port 1: new foreign master 00b0ae.fffe.02d103-1
ptp4l[250708.955]: selected best master clock 00b0ae.fffe.02d103
ptp4l[250708.955]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[250709.856]: port 1: minimum delay request interval 2^-7
ptp4l[250710.698]: master offset1 -6601404463576 s0 freq
+100000000
path delay    220834
ptp4l[250711.598]: master offset1 -6601404940762 s0 freq
+100000000
path delay    224676
ptp4l[250712.498]: master offset1 -6601405412898 s0 freq
+100000000
path delay    223500
This smells of a driver bug. Notice how the frequency shift is maxed,
and yet the clock is still drifting farther apart. This either means
that the real clock drift is *over* 10%, (which is very unlikely), or
there is a bug in the frequency tuning. But if you really are using
software timestamps, this doesn't make sense.
Again, if you're not using vanilla LinuxPTP 1.4, I would retry with
that and confirm the behavior. If you are using vanilla LinuxPTP, I
would confirm that you are infact actually using software only
timestamping.
Regards,
Jake
Keller, Jacob E
2016-01-15 23:57:18 UTC
Permalink
Hi Jake, 
Because my hardware NIC is not 1588 capable and that would require
FPGA change, however I'm hoping to get better timestamp accuracy from
the hardware clock that is tuned to the host system clock in software
timestamping mode (which I understand it's in the reverse direction
of LinuxPTP hardware timestamping configuration where the system
clock synchronizes to the PHC clock instead).
Daniel
If you have a hardware clock which you can slew, and the ability to
take hardware timestamps I am failing to see how you are unable to
implement the PHC subsystem calls?

I suspect however you are converting the timestamps taken by hardware
is incorrect, and we can't help you with that easily.

This is the first place I would look for an issue, especially if you
can confirm that your software timestamps without the special ethernet
driver work as expected.

Regards,
Jake

Loading...