Richard
Thanks for the response.
I now have one board running kernel 3.18 and another running kernel 4.9.
I still see the issue with 3.18 but I haven't yet seen it on 4.9.
Unfortunately, we have a proprietary driver for a device on the pcie bus which doesn't yet support 4.x kernels and it is this that generates (via an application) most of the network traffic.
I might have to port all of the stmmac changes back to 3.18.
If I add 37 seconds to getnstimeofday then the effect of the "glitch" is less pronounced.
Kernel 3.18 introduced timekeeping.c, with timekeeping_get_tai_offset(), which I thought might give me the UTC offset but it returns 0 at the point I call it.
Is there a call within the kernel to find the UTC offset?
Regards
Ian T.
-----Original Message-----
From: Richard Cochran [mailto:***@gmail.com]
Sent: Thursday, April 06, 2017 12:49 AM
To: Ian Thompson
Cc: linuxptp-***@lists.sourceforge.net
Subject: [External] Re: [Linuxptp-users] PTP - MAC time
Post by Ian ThompsonWhy is the time that gets put into the PTP registers in the STM MAC, Unix time rather than PTP time?
See below.
Post by Ian ThompsonPossibly following on from David’s post.
We have a system with 18 boards in a rack, each board has a Altera SoC with the STM Ethernet MAC connected via gigabit Ethernet to an Arista ptp-aware switch and then a Spectracom GrandMaster.
The boards are running Linux kernel 3.15.0.
That HW puts the time stamps into the buffer descriptor, and so in theory it should never miss a time stamp. This is most likely a driver bug. Looking at the git log I see:
v4.11-rc1~124^2~171^2~12 deeb637 net: stmmac: remove freesoftware address
v4.9-rc7~33^2~33^2~1 ba1ffd7 stmmac: fix PTP support for GMAC4
v4.9-rc7~33^2~33^2~2 d204205 stmmac: update the PTP header file
v4.9-rc4~28^2~68 c30a70d stmmac: fix and review the ptp registration.
v4.9-rc4~28^2~96 50756eb stmmac: fix an error code in stmmac_ptp_register()
v4.9-rc1~28^2~10 7086605 stmmac: fix error check when init ptp
v4.9-rc1~127^2~108 efee95f ptp_clock: future-proofing drivers against PTP subsystem becoming optional
v4.1-rc1~128^2~100^2~5 e7ea55b ptp: stmmac: use helpers for converting ns to timespec.
v4.1-rc1~128^2~119^2~6 3f6c465 ptp: stmmac: convert to the 64 bit get/set time methods.
v3.17-rc5~41^2~38 5566401 stmmac: ptp: fix the reference clock
v3.17-rc5~41^2~50 f95f404 stmmac: set ptp_clock to NULL while unregister
v3.15-rc1~113^2~108^2~5 4986b4f0 ptp: drivers: set the number of programmable pins.
v3.13-rc7~13^2 7cd0139 stmmac: Fix incorrect spinlock release and PTP cap detection.
v3.10-rc1~66^2~195 32ceabc stmmac: improve/review and fix kernel-doc
v3.10-rc1~66^2~327^2~1 92ba688 stmmac: add the support for PTP hw clock driver
v3.10-rc1~66^2~327^2~2 891434b stmmac: add IEEE PTPv1 and PTPv2 support.
Especially ba1ffd7 looks suspicious.
Post by Ian ThompsonApr 4 13:42:04 localhost user.info ptp4l: [537.164] rms 123 max 599 freq +255 +/- 39 delay 7362 +/- 48
Apr 4 13:42:29 localhost user.err ptp4l: [561.387] timed out while polling for tx timestamp
Apr 4 13:42:29 localhost user.err ptp4l: [561.387] increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
Apr 4 13:42:29 localhost user.err ptp4l: [561.387] port 1: send delay request failed
Apr 4 13:42:29 localhost user.notice ptp4l: [561.387] port 1: SLAVE
to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) Apr 4 13:42:45 localhost user.notice ptp4l: [577.388] port 1: FAULTY to LISTENING on FAULT_CLEARED
Apr 4 13:42:45 localhost user.warn ptp4l: [577.414] clockcheck: clock jumped backward or running slower than expected!
Apr 4 13:42:45 localhost user.notice ptp4l: [577.414] port 1: new
foreign master 000cec.fffe.0a085d-1 Apr 4 13:42:47 localhost
user.notice ptp4l: [579.414] selected best master clock
000cec.fffe.0a085d Apr 4 13:42:47 localhost user.notice ptp4l: [579.414] port 1: LISTENING to UNCALIBRATED on RS_SLAVE Apr 4 13:42:54 localhost user.notice ptp4l: [587.164] port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
Apr 4 13:46:46 localhost user.info ptp4l: [818.414] rms 2312500092 max 37000001557 freq +246 +/- 250 delay 7358 +/- 46
Apr 4 13:51:02 localhost user.info ptp4l: [1074.413] rms 116 max 681 freq +256 +/- 48 delay 7373 +/- 88
Does this imply that one lost delay request can do this, or is there a retry mechanism?
One lost delay request shouldn't introduct such a large error. This is a driver bug. Notice that the time error is 37 seconds, or the UTC/TAI offset.
When resetting the fault, ptp4l re-initializes HW time stamping.
The funtion, stmmac_hwtstamp_ioctl(), in
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
programs the system time (UTC) into the PHC every time HW time stamping is enabled. It shouldn't do that.
Post by Ian ThompsonWe have a lot of traffic leaving the boards but only PTP traffic
coming in. As we increase the off board transfer rates the problem
seems to occur more often.
That could indicate a driver or a HW issue, or both.
HTH,
Richard