Discussion:
[Linuxptp-users] Support for latest igb driver?
Richard Cochran
2016-12-22 18:13:59 UTC
Permalink
Richard, I'm lost already. My igb driver 5.3.5.4 does not have
igb_tsync_interrupt().
See attached source for igb_main.c. Am I supposed to be using some other
version of the igb driver?
Well, you can use anything you want, but that file is rather out of
date WRT the mainline Linux driver. It lacks this change

commit 61d7f75f45231e4a2f2ab7d975555f55f0019800
Author: Richard Cochran <***@gmail.com>
Date: Fri Nov 21 20:51:10 2014 +0000

igb: refactor time sync interrupt handling

The code that handles the time sync interrupt is repeated in three
different places. This patch refactors the identical code blocks into
a single helper function.

which was merged in v4.0. I recommend using a recent mainline kernel version.
static irqreturn_t igb_msix_other(int irq, void *data)
{
...
#ifdef HAVE_PTP_1588_CLOCK
if (icr & E1000_ICR_TS) {
u32 tsicr = E1000_READ_REG(hw, E1000_TSICR);
if (tsicr & E1000_TSICR_TXTS) {
/* acknowledge the interrupt */
E1000_WRITE_REG(hw, E1000_TSICR, E1000_TSICR_TXTS);
/* retrieve hardware timestamp */
schedule_work(&adapter->ptp_tx_work);
}
}
#endif /* HAVE_PTP_1588_CLOCK */
...
return IRQ_HANDLED;
}
static irqreturn_t igb_intr_msi(int irq, void *data)
{
...
#ifdef HAVE_PTP_1588_CLOCK
if (icr & E1000_ICR_TS) {
u32 tsicr = E1000_READ_REG(hw, E1000_TSICR);
if (tsicr & E1000_TSICR_TXTS) {
/* acknowledge the interrupt */
E1000_WRITE_REG(hw, E1000_TSICR, E1000_TSICR_TXTS);
/* retrieve hardware timestamp */
schedule_work(&adapter->ptp_tx_work);
}
}
#endif /* HAVE_PTP_1588_CLOCK */
...
return IRQ_HANDLED;
}
static irqreturn_t igb_intr(int irq, void *data)
{
...
#ifdef HAVE_PTP_1588_CLOCK
if (icr & E1000_ICR_TS) {
u32 tsicr = E1000_READ_REG(hw, E1000_TSICR);
if (tsicr & E1000_TSICR_TXTS) {
/* acknowledge the interrupt */
E1000_WRITE_REG(hw, E1000_TSICR, E1000_TSICR_TXTS);
/* retrieve hardware timestamp */
schedule_work(&adapter->ptp_tx_work);
}
}
#endif /* HAVE_PTP_1588_CLOCK */
..
return IRQ_HANDLED;
}
Try deleting the lines
/* acknowledge the interrupt */
E1000_WRITE_REG(hw, E1000_TSICR, E1000_TSICR_TXTS);
in each of the three cases.

HTH,
Richard
Rich Schmidt
2016-12-30 16:31:43 UTC
Permalink
I am sorry to report that the proposed fix to the problem SLAVE to FAULTY
on FAULT_DETECTED (FT_UNSPECIFIED),
shown below did not resolve the issue.
Red Hat LINUX: with source kernel 4.9.0
Intel igb driver: 5.4.0-k

Prior to compiling the kernel:

cd /usr/src/linux-4.9/drivers/net/ethernet/intel/igb
Edit igb_main.c and comment out at line 5715:
/* wr32(E1000_TSICR, ack); */

Ran fine for a while then failed as shown below. Able to restore by
killing ptp4l, rmmod igb; modprobe igb, restart ptp4l.

Here is the ptp4l log after running successfully for 26.65 hours:

ptp4l[101975.294]: master offset -58 s2 freq +831 path delay
1632
ptp4l[101976.294]: linreg: points 8 slope 0.999999144 intercept 3 err 25
ptp4l[101976.294]: master offset -10 s2 freq +853 path delay
1632
ptp4l[101976.900]: port 1: delay timeout
ptp4l[101976.910]: timed out while polling for tx timestamp
ptp4l[101976.910]: increasing tx_timestamp_timeout may correct this issue,
but it is likely caused by a driver bug
ptp4l[101976.910]: port 1: send delay request failed
ptp4l[101976.910]: port 1: SLAVE to FAULTY on FAULT_DETECTED
(FT_UNSPECIFIED)
ptp4l[101976.910]: waiting 2^{4} seconds to clear fault on port 1
ptp4l[101992.911]: clearing fault on port 1
ptp4l[101992.911]: config item enp1s0f0.logMinDelayReqInterval is 2
ptp4l[101992.911]: config item enp1s0f0.logAnnounceInterval is 0
ptp4l[101992.911]: config item enp1s0f0.announceReceiptTimeout is 4
ptp4l[101992.911]: config item enp1s0f0.syncReceiptTimeout is 0
ptp4l[101992.911]: config item enp1s0f0.transportSpecific is 0
ptp4l[101992.911]: config item enp1s0f0.logSyncInterval is 0
ptp4l[101992.911]: config item enp1s0f0.logMinPdelayReqInterval is 2
ptp4l[101992.911]: config item enp1s0f0.neighborPropDelayThresh is 20000000
ptp4l[101992.911]: config item enp1s0f0.min_neighbor_prop_delay is -20000000
ptp4l[101992.911]: config item enp1s0f0.udp_ttl is 1
ptp4l[101992.915]: driver changed our HWTSTAMP options
ptp4l[101992.915]: tx_type 1 not 1
ptp4l[101992.915]: rx_filter 1 not 12
ptp4l[101992.915]: config item (null).dscp_event is 0
ptp4l[101992.915]: config item (null).dscp_general is 0
ptp4l[101992.915]: port 1: FAULTY to LISTENING on FAULT_CLEARED
ptp4l[101993.294]: port 1: setting asCapable
ptp4l[101993.299]: port 1: new foreign master 0019dd.fffe.00085c-1
ptp4l[101995.299]: selected best master clock 0019dd.fffe.00085c
ptp4l[101995.299]: foreign master not using PTP timescale
ptp4l[101995.299]: running in a temporal vortex
ptp4l[101995.299]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[101996.295]: linreg: points 8 slope 0.999999153 intercept 142 err 29
ptp4l[101996.295]: master offset -150 s2 freq +705 path delay
1632
ptp4l[101996.295]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[101996.635]: port 1: delay timeout
ptp4l[101996.645]: timed out while polling for tx timestamp
ptp4l[101996.645]: increasing tx_timestamp_timeout may correct this issue,
but it is likely caused by a driver bug
ptp4l[101996.645]: port 1: send delay request failed
ptp4l[101996.645]: port 1: SLAVE to FAULTY on FAULT_DETECTED
(FT_UNSPECIFIED)
ptp4l[101996.645]: waiting 2^{4} seconds to clear fault on port 1
ptp4l[102012.645]: clearing fault on port 1
. . .

Richard Schmidt, CTR
Time Service Dept.
US Naval Observatory
I've been testing linuxptp for about a year (now version 1.8) and am
still
seeing the following failure always after 8 or more days of successful
ptp4l[4906544.301]: port 1: delay timeout
ptp4l[4906545.303]: timed out while polling for tx timestamp
ptp4l[4906545.303]: increasing tx_timestamp_timeout may correct this
issue,
but it is likely cause
d by a driver bug
ptp4l[4906545.303]: port 1: send delay request failed
I don't recalling seeing this myself, but still this is the second
such igb failure report I have received recently.
I wonder whether the incorrect double TSICR acknowledge is the root
static void igb_tsync_interrupt(struct igb_adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
struct ptp_clock_event event;
struct timespec64 ts;
u32 ack = 0, tsauxc, sec, nsec, tsicr = rd32(E1000_TSICR);
...
/* acknowledge the interrupts */
wr32(E1000_TSICR, ack);
}
According to the datasheet, the first rd32() should already
acknowledge the interrupts, but the 82580 (iirc) has a bug that
requires the additional wr32().
Try removing that last line, and see if things improve...
Thanks,
Richard
--
"If you want to build a ship, don’t drum up people to collect wood and
don’t assign them tasks and work, but rather teach them to long for the
endless immensity of the sea."

- *Antoine de Saint-Exupéry*
Richard Cochran
2017-01-02 15:57:21 UTC
Permalink
Post by Rich Schmidt
I am sorry to report that the proposed fix to the problem SLAVE to FAULTY
on FAULT_DETECTED (FT_UNSPECIFIED),
shown below did not resolve the issue.
Red Hat LINUX: with source kernel 4.9.0
Intel igb driver: 5.4.0-k
Hm. You said you have an i350? I don't have that one.

Can you post your `lspci` output?
Post by Rich Schmidt
Ran fine for a while then failed as shown below. Able to restore by
killing ptp4l, rmmod igb; modprobe igb, restart ptp4l.
Sure sounds like a driver or HW issue.
Can you hit the bug sooner by increasing the message rate?

For example, logMinDelayReqInterval=-4

I'll try some long term testing here...

Thanks,
Richard

Loading...