Discussion:
[Linuxptp-users] ptp4l stack integration via linux OS with least burden on the Firmware and major PTP offload to the hardware
MSR, CHANDRASEKHAR
2017-04-12 18:41:45 UTC
Permalink
Hi Friends,

I am lost in visualizing a proper system with ptp4l on our hardware that supports linux along with hardware timestamping. Could you help bring some clarity on my following question? Please note that I am a systems architect and a linux expert and thus the following flow is a generic one. Please fill in the gaps from the linux perspective, if you see any. Appreciate your help on this long thread.

Ptp4l stack generates the sync packets - can be layer 2, layer3 (ipv4/6) with UDP etc. Each packet gets a set of skbufs + 1 more skbuf for associating the corresponding timestamp from the hardware to complete the dma flow for a ptp packet. Now, these sync packets come along with other regular Ethernet, IP packets, which donot need the extra skbuf for timestamp purpose. The hardware (DMA+PTP packet parser + MAC+Phy) has the capability to timestamp only the PTP packets. Now, WITHOUT LOADING THE DRIVER/FIRMWARE (the driver should not parse the packets to know whether they are ptp or non-ptp regular packets), can I have the capability to assign an extra skbuf for a ptp packet to associate the timestamp with the packet's DMA flow? In other words, the non-ptp packets will be parsed accordingly in the hardware without generating any timestamps and thus their DMA flow (considering tx) will terminate with no the extra skbuf (not needed as they donot generate timestamps in the hardware). Given this scenario,

How does the driver know for which packet, it should complete the dma flow by waiting for the timestamp and associating it with the extra skbuf? If we can have this driven by the respective application (ptp4l for ptp packets and tcp/udp apps for regular non-ptp traffic) in a way that the firmware/driver knows about it easily without any burdensome operation such as parsing, it is very efficient (more traffic on ethernet link). More importantly, the underlying hardware does the tougher job of producing timestamps for only the necessary packets.

A simple solution is timestamping all the packets and making the dma flow uniform to all the packets - ptp and non-ptp regular packets, where each packet's timestamp whether needed or not for the ptp4l stack will be stored in the hardware fifo. It can lead to overflow due to slower turnaround from the driver or in the worst case slow down the link losing the throughput.

My preferred solution: The DMA header has a control bit that clearly tells PTP or non-PTP packet. The firmware has two threads - mac and ptp. Now the contro bit determins which thread has to complete the dma flow. For example, for a non-ptp packet, the mac driver will complete the DMA flow by sending the status back without any timestamp in it (prefer to have no bytes allocated for the timestamp at all). For a ptp packet, the ptp driver will complete the DMA flow by sending the final status back to the DMA driver by inserting the timestamp (extra bytes present in the DMA status word) in the status word. The ptp driver knows that it has to read the status register to check whether the PTP packet's timestamp is ready in the FIFO. However, the mac driver does not even bother about this and will complete sending its status quickly. I see no sequential requirements of packet status for the DMA completion between ptp and non-ptp packets. I feel this is very efficient and faster, enabling the highest throughput on the Ethernet link and achieving the objective that the PTP flow should anyway utilize the link to the absolute minimum required and be least intrusive to the regular non-PTP flow.

Thanking you in anticipation,
Regards,
Chandra

(c) : +60.175508142
(O): +60.4.636.6412

"Knowledge speaks, Wisdom listens"
Richard Cochran
2017-04-12 19:40:46 UTC
Permalink
Chandra,

Your question has nothing to do with linuxptp (user space stack) or
even with the Linux kernel, and as such it is off topic for this list.

Having said that, I cannot resist offering an opinion...
Post by MSR, CHANDRASEKHAR
A simple solution is timestamping all the packets and making the dma
flow uniform to all the packets - ptp and non-ptp regular packets,
where each packet's timestamp whether needed or not for the ptp4l
stack will be stored in the hardware fifo.
IMHO this is only reasonable approach for modern MAC hardware. Just
write the time stamp into the packet descriptor and be done with it.
It is just eight bytes, after all. No two threads, no packet parsing
or alternate paths. KISS.

Also, precise time stamps are useful to other applications beyond PTP,
and so making PTP frames into a special case is artificially limiting
the usefulness of your HW.
Post by MSR, CHANDRASEKHAR
It can lead to overflow due to slower turnaround from the driver or
Nonsense. The driver must read the descriptor in any case, and it
will only handle the time stamp if the option is enabled.
Post by MSR, CHANDRASEKHAR
in the worst case slow down the link losing the throughput.
again, you can have a descriptor bit that tells whether to copy the
time stamp back or not.

Just my 2 cents,

Richard
MSR, CHANDRASEKHAR
2017-04-14 14:47:30 UTC
Permalink
Hi Richard,

Thank you a lot for responding to my question though you could have easily shut me off, given the 'off topic' nature of the question. I need some basic clarificaions from your response.

You say " IMHO this is only reasonable approach for modern MAC hardware" & " Nonsense. The driver must read the descriptor in any case, and it will only handle the time stamp if the option is enabled.". My confusion here is: You seem to support timestamping all the packets but driver will handle the timestamp if the option is enabled. If the driver knows whether to handle the timestamp or not through the flags in the DMA descriptor, then the underlying MAC hardware does take advantage of it so that the timestamping is carried out for the packets (any - ptp or nonptp), which has the option enabled. As you might know, the hardware usually stores the timestamps in a FIFO, which will be read out on polling the status or a suitable interrupt. With the distinction between packet needing timestamps (ptp or nonptp) and packet with no need of timestamps, the number of timestamps that can be pushed by the MAC hardware into the FIFO is limited, saving considerable silicon - especially the case with ASICs, where the depths are predetermined unlike FPGAs. Further, the driver has to wait for some more time after the packet is transmitted, to complete the dma ack with the timestamp (to be collected from the FIFO and put it back in the descriptor). However, for the packets not needing timestamps, the driver can give the ack quickly fetching the next set of descriptors. Thus, timestamping as needed by the MAC hardware is really the need of the hour than timestamping every packet. Please share your opinion.

You say " It is just eight bytes, after all". Actually, it is not a PTM (PCIe) timestamp. In 1588v2, the timestamp is 10 bytes/80b. Further, our hardware provides even fractional nanoseconds of 16b (giving out 2^-16ns resolution). Understandably, the proprietary stacks have the capability to handle the complete 12B of timestamps than the public-domain stacks. Just FYI. Do share your opinion.

Thank you again for sharing your expertise with us - it is very helpful.

***
Having said that, I cannot resist offering an opinion...
Post by MSR, CHANDRASEKHAR
A simple solution is timestamping all the packets and making the dma
flow uniform to all the packets - ptp and non-ptp regular packets,
where each packet's timestamp whether needed or not for the ptp4l
stack will be stored in the hardware fifo.
IMHO this is only reasonable approach for modern MAC hardware. Just write the time stamp into the packet descriptor and be done with it.
It is just eight bytes, after all. No two threads, no packet parsing or alternate paths. KISS.

Also, precise time stamps are useful to other applications beyond PTP, and so making PTP frames into a special case is artificially limiting the usefulness of your HW.
Post by MSR, CHANDRASEKHAR
It can lead to overflow due to slower turnaround from the driver or
Nonsense. The driver must read the descriptor in any case, and it will only handle the time stamp if the option is enabled.
Post by MSR, CHANDRASEKHAR
in the worst case slow down the link losing the throughput.
again, you can have a descriptor bit that tells whether to copy the time stamp back or not.

Just my 2 cents,
***

Thanking you in anticipation,
Regards,
Chandra

(c) : +60.175508142
(O): +60.4.636.6412

"Knowledge speaks, Wisdom listens"


-----Original Message-----
From: Richard Cochran [mailto:***@gmail.com]
Sent: Thursday, April 13, 2017 3:41 AM
To: MSR, CHANDRASEKHAR <***@intel.com>
Cc: linuxptp-***@lists.sourceforge.net
Subject: Re: [Linuxptp-users] ptp4l stack integration via linux OS with least burden on the Firmware and major PTP offload to the hardware

Chandra,

Your question has nothing to do with linuxptp (user space stack) or even with the Linux kernel, and as such it is off topic for this list.

Having said that, I cannot resist offering an opinion...
Post by MSR, CHANDRASEKHAR
A simple solution is timestamping all the packets and making the dma
flow uniform to all the packets - ptp and non-ptp regular packets,
where each packet's timestamp whether needed or not for the ptp4l
stack will be stored in the hardware fifo.
IMHO this is only reasonable approach for modern MAC hardware. Just write the time stamp into the packet descriptor and be done with it.
It is just eight bytes, after all. No two threads, no packet parsing or alternate paths. KISS.

Also, precise time stamps are useful to other applications beyond PTP, and so making PTP frames into a special case is artificially limiting the usefulness of your HW.
Post by MSR, CHANDRASEKHAR
It can lead to overflow due to slower turnaround from the driver or
Nonsense. The driver must read the descriptor in any case, and it will only handle the time stamp if the option is enabled.
Post by MSR, CHANDRASEKHAR
in the worst case slow down the link losing the throughput.
again, you can have a descriptor bit that tells whether to copy the time stamp back or not.

Just my 2 cents,

Richard
Keller, Jacob E
2017-04-14 16:11:41 UTC
Permalink
Post by MSR, CHANDRASEKHAR
-----Original Message-----
Sent: Friday, April 14, 2017 7:48 AM
Subject: Re: [Linuxptp-users] ptp4l stack integration via linux OS with least
burden on the Firmware and major PTP offload to the hardware
Hi Richard,
Thank you a lot for responding to my question though you could have easily shut
me off, given the 'off topic' nature of the question. I need some basic clarificaions
from your response.
You say " IMHO this is only reasonable approach for modern MAC hardware" & "
Nonsense. The driver must read the descriptor in any case, and it will only handle
the time stamp if the option is enabled.". My confusion here is: You seem to
support timestamping all the packets but driver will handle the timestamp if the
option is enabled. If the driver knows whether to handle the timestamp or not
through the flags in the DMA descriptor, then the underlying MAC hardware
does take advantage of it so that the timestamping is carried out for the packets
(any - ptp or nonptp), which has the option enabled. As you might know, the
hardware usually stores the timestamps in a FIFO, which will be read out on
polling the status or a suitable interrupt. With the distinction between packet
needing timestamps (ptp or nonptp) and packet with no need of timestamps, the
number of timestamps that can be pushed by the MAC hardware into the FIFO is
limited, saving considerable silicon - especially the case with ASICs, where the
depths are predetermined unlike FPGAs. Further, the driver has to wait for some
more time after the packet is transmitted, to complete the dma ack with the
timestamp (to be collected from the FIFO and put it back in the descriptor).
However, for the packets not needing timestamps, the driver can give the ack
quickly fetching the next set of descriptors. Thus, timestamping as needed by the
MAC hardware is really the need of the hour than timestamping every packet.
Please share your opinion.
You say " It is just eight bytes, after all". Actually, it is not a PTM (PCIe) timestamp.
In 1588v2, the timestamp is 10 bytes/80b. Further, our hardware provides even
fractional nanoseconds of 16b (giving out 2^-16ns resolution). Understandably,
the proprietary stacks have the capability to handle the complete 12B of
timestamps than the public-domain stacks. Just FYI. Do share your opinion.
Thank you again for sharing your expertise with us - it is very helpful.
It's certainly easier in software if all packets have a timestamp. Additionally, I imagine it's easier in hardware if there is no special casing since you wouldn't need any circuitry to build out how to decide which frames get timestamped and which do not. This should be simpler. It seems to be that yes you do pay the 8 or 16 bytes per frame if you have timestamps always, but that cost is smaller than the huge complexity required to have the hardware pre-determine which packet gets timestamped if it only supports a subset of "all packets" while also forcing software to now be aware of what the hardware will or will not timestamp.

It may be that specific hardware has limitations on how it timestamps for other reasons, but obviously we can't comment on how the hardware is designed, since we didn't design it. I dont think a FIFO for pushing timestamps by the MAC is necessarily the only way to do it, and certainly lots of hardware already suports "timestamp all packets" already today. This is a valuable feature to software already, so if you wanted to support it then you don't need support for any subset of all packets.

Thanks,
Jake
MSR, CHANDRASEKHAR
2017-04-14 16:28:33 UTC
Permalink
Hi Jake,

I appreciate your feedback. Yes - our hardware and most of other hardware supports 'timestamp all' - an easier option for the software as you mention. However, it is not always the case for different customers -
• given the encapsulations such as MPLS-TP beyond the linux supported IPv4/6 UDP packets, we definitely need proprietary hardware for improved parsing capabilities.
• Also, the 1-step implementation requires insertion of ToD and update of CF at specific locations while the packets are in transit (unlike 2-step mechanism, where the timestamps can be collected later/after the packets are egressed). These offsets are obtained by parsing the packets inside the hardware rather than blowing the size of the descriptors, which consumes many cpu cycles.
• Most importantly we have witnessed two groups - generic OS based (Linux etc) group, which prefers 'timestamp all' and proprietary OS based (network switches, processors) group, which takes advantage of the underlying hardware to reap the best performance (timestamping as directed by stack a subset). Needless to say, the hardware has to support both the groups and we do that.

Thanking you in anticipation,
Regards,
Chandra

© : +60.175508142
(O): +60.4.636.6412

“Knowledge speaks, Wisdom listens”


-----Original Message-----
From: Keller, Jacob E
Sent: Saturday, April 15, 2017 12:12 AM
To: MSR, CHANDRASEKHAR <***@intel.com>; Richard Cochran <***@gmail.com>
Cc: linuxptp-***@lists.sourceforge.net
Subject: RE: [Linuxptp-users] ptp4l stack integration via linux OS with least burden on the Firmware and major PTP offload to the hardware
Post by MSR, CHANDRASEKHAR
-----Original Message-----
Sent: Friday, April 14, 2017 7:48 AM
Subject: Re: [Linuxptp-users] ptp4l stack integration via linux OS
with least burden on the Firmware and major PTP offload to the
hardware
Hi Richard,
Thank you a lot for responding to my question though you could have
easily shut me off, given the 'off topic' nature of the question. I
need some basic clarificaions from your response.
You say " IMHO this is only reasonable approach for modern MAC hardware" & "
Nonsense. The driver must read the descriptor in any case, and it
will only handle the time stamp if the option is enabled.". My
confusion here is: You seem to support timestamping all the packets
but driver will handle the timestamp if the option is enabled. If the
driver knows whether to handle the timestamp or not through the flags
in the DMA descriptor, then the underlying MAC hardware does take
advantage of it so that the timestamping is carried out for the
packets (any - ptp or nonptp), which has the option enabled. As you
might know, the hardware usually stores the timestamps in a FIFO,
which will be read out on polling the status or a suitable interrupt.
With the distinction between packet needing timestamps (ptp or nonptp)
and packet with no need of timestamps, the number of timestamps that
can be pushed by the MAC hardware into the FIFO is limited, saving
considerable silicon - especially the case with ASICs, where the
depths are predetermined unlike FPGAs. Further, the driver has to wait for some more time after the packet is transmitted, to complete the dma ack with the timestamp (to be collected from the FIFO and put it back in the descriptor).
However, for the packets not needing timestamps, the driver can give
the ack quickly fetching the next set of descriptors. Thus,
timestamping as needed by the MAC hardware is really the need of the hour than timestamping every packet.
Please share your opinion.
You say " It is just eight bytes, after all". Actually, it is not a PTM (PCIe) timestamp.
In 1588v2, the timestamp is 10 bytes/80b. Further, our hardware
provides even fractional nanoseconds of 16b (giving out 2^-16ns
resolution). Understandably, the proprietary stacks have the
capability to handle the complete 12B of timestamps than the public-domain stacks. Just FYI. Do share your opinion.
Thank you again for sharing your expertise with us - it is very helpful.
It's certainly easier in software if all packets have a timestamp. Additionally, I imagine it's easier in hardware if there is no special casing since you wouldn't need any circuitry to build out how to decide which frames get timestamped and which do not. This should be simpler. It seems to be that yes you do pay the 8 or 16 bytes per frame if you have timestamps always, but that cost is smaller than the huge complexity required to have the hardware pre-determine which packet gets timestamped if it only supports a subset of "all packets" while also forcing software to now be aware of what the hardware will or will not timestamp.

It may be that specific hardware has limitations on how it timestamps for other reasons, but obviously we can't comment on how the hardware is designed, since we didn't design it. I dont think a FIFO for pushing timestamps by the MAC is necessarily the only way to do it, and certainly lots of hardware already suports "timestamp all packets" already today. This is a valuable feature to software already, so if you wanted to support it then you don't need support for any subset of all packets.

Thanks,
Jake

Loading...