gigabit switch that supports jumbo frames?

I think you're out of luck. I reckon the IEEE now considers the max frame size issue closed. Whenever this issue is brought up in the IEEE the proponents of jumbo frames struggle to justify their case and a host people pile in with valid objections. The IEEE can only change things if there is a clear concensus to do so. There is definitely no concensus for standardising jumbo frames in 802.3.

You can find a sample of the debate in the high speed study group at:

formatting link

Reply to
Marris
Loading thread data ...

How much of a help is frame bursting, as I understand it grouping logical frames together for transmission. Even without that, most NICs will group them before interrupting the host to reduce interrupt overhead.

-- glen

Reply to
glen herrmannsfeldt

IIRC, that also considered for fast ATM trunk links? It's been awhile, but I think that same idea was proposed (maybe implemented too). That takes care of the problem, I think. Reduces switching overhead.

Bert

Reply to
Albert Manfredi

Thanks - what of it I went through read very much like chicken and egg :)

rick jones

Reply to
Rick Jones

Glen, Bursting was specified for half-duplex gigabit Ethernet because the slot-time was increased for gigabit Ethernet and in theory bursting leads to better utilization of a shared medium. However switching came along and nobody deployed half-duplex gigabit Ethernet.

I don't think bursting helps with frame processing as you don't get any fewer frames and they have the same encapsulation.

Arthur.

Reply to
Marris

Marris wrote: (snip about frame bursting)

It would seem that the NIC should be able to process one burst similar to the processing of a jumbo frame, and with only a little extra work, IP should be able to do the same thing. IP could, for example, give a 9000 MTU to TCP, accept the data, and then frame burst it out. (That is, if full duplex gigabit supports it, which I don't know.)

Not quite as easy as jumbo frames, but not so much harder, either.

-- glen

Reply to
glen herrmannsfeldt

IP Framentation bad, TSO good, JumboFrame better :) I believe that the IETF's response to fragmentation is even "stronger" than the IEEE's response to JumboFrames. Such that in IPv6 there really isn't supposed to _be_ any fragmentation, and fragmenting is a rather more difficult thing to do - extra headers IIRC and won't (?) be done by intermediate routers at all.

Lose any one of the fragments of an IP datagram, the entire thing will have to be retransmitted by the originating system. In the end though I suspect a bit error in a JF frame isn't any worse than an IP fragment lost out of a chain of 9000/1500. IIRC there are some "funnies" invovling firewalls and fragmented traffic too.

I call TSO "poor man's jumbo frame" because it is still 1500 byte packets on the network, and the returning ACK stream is still the same as with a standard 1500 byte MTU. In the case of TSO, a single bit error will only toast on 1460 byte TCP segment (handwaving the sizes when TCP "timestamps" are present etc...)

There are some 10G NICs playing with large receive offload, but I've not played with it enough yet to see how well it works. Issues could include, but not be limited to how long the NIC waits for subsequent in-order segments to arrive and what that does to latency etc.

rick jones

Reply to
Rick Jones

I probably didn't say it right. The idea was that TCP would pass

9000 bytes (more or less) of data ready to be sent using frame bursting. The higher levels of TCP would see 9000, but just before passing it to IP it would be constructed in the form of separate frames. If things work right, those frames are received by a frame burst NIC at the other end, and the inverse processing is done.

If not, they come out as normal 1500 byte frames and are processed normally.

The goal is to get jumbo frame performance when possible, and normal performance when not. Switches could handle them as bursts, buffer them just like jumbo frames, or separate them if needed.

-- glen

Reply to
glen herrmannsfeldt

Modulo the inverse processing, and whether or not there is frame bursting, on the surface that sounds a bit like what Sun purports to do in their stack - they call it Multi-Data Transmit. I call it "poor-man's TSO" :) They stick with the 1500 byte MTU and the remote side is not trying to to receive segment coalescing so the ACK stream remains the same as a "normal" 1500 byte case.

The NICs doing "LRO" or Large Receive Offload appear to be trying to do something similar to the receive side of what you describe - aggregate several 1500 byte MTU segments into a single larger packet they give to the host. I'm not sure what happens with the returning ACK stream there though.

Or am I still missing a peice?

rick jones

Reply to
Rick Jones

And even better than that is an operating system that allows the network controller (NIC) to move data directly to/from application virtual memory space, without *any* OS intervention. Such a scheme makes the frame length almost completely irrelevant; the only issue becomes the per-frame overhead (e.g., preamble, FCS, etc.), which is fairly inconsequential, as opposed to the processing overhead, which is what everyone in this thread has been discussing. That is, with proper OS/NIC interaction, performance is no longer a strong function of frame length.

We (at DEC) employed exactly this scheme over *25 years ago* in the VAX CI port architecture, which was used to create VAXclusters of multiple mainframes, using a high-speed network for the backbone interconnect. At that time, the hardware was considered incredibly complex and expensive--over $20K (1978 dollars) per port. Today, the entire scheme could easily be implemented in inexpensive silicon; what's a few hundred thousand gates nowadays?

What is needed is to have the OS allow the NIC access to the system page table, so that it can do the virtual-to-physical memory mapping itself, page swap into physical memory as needed, and perform all appropriate data transfers without asking the OS for any additional processing or resources. VAX/VMS allowed this, but I know of no OS today where this idea is implemented, or even feasible. (Disclaimer: I'm a network architect, not an OS guru. I welcome comment and correction from those who know more than I do about this subject. I did not design the VAX CI architecture (I did work on the backbone network design), although I used it extensively.)

-- Rich Seifert Networks and Communications Consulting 21885 Bear Creek Way (408) 395-5700 Los Gatos, CA 95033 (408) 228-0803 FAX

Send replies to: usenet at richseifert dot com

Reply to
Rich Seifert

That is what "RNIC's" are trying to do aren't they? About the per-frame overhead - if the copies are gone, that makes the per-frame overhead the dominant part of the overhead (other than app processing of course) doesn't it?

rick jones

Reply to
Rick Jones

In a simplified way, perhaps. What I am really getting at is the idea of making application transfers across a network appear (to the O/S) not to be data moving through a peripheral adapter, but as a memory-to-memory transfer.

Yes, but that is rather insignificant; Ethernet combined with TCP/IP imposes an overhead of only 78 bytes (40 of which are TCP/IP) on a total frame of 1538 bytes (including interframe gap, preamble, etc.), or 5%. Even with VLANs, encryption, and other baggage, it's not a big deal. It's surely not worth the trouble of using jumbo frames.

-- Rich Seifert Networks and Communications Consulting 21885 Bear Creek Way (408) 395-5700 Los Gatos, CA 95033 (408) 228-0803 FAX

Send replies to: usenet at richseifert dot com

Reply to
Rich Seifert

Rich Seifert wrote in part:

This depends very much on OS & app design and support. In a primative way, mmap() of an NFS file does this.

Undoubtedly true wrt transmission overhead. However, commonly used IBM PC-compatible x86 architectures have serious interrupt handling overhead, around 1000 ns. Jumbo frames reduce the frequency of interrupts. This affect receivers far more than transmission which is normally _not_ interrupt driven.

-- Robert

Reply to
Robert Redelmeier

I thought colescing multiple packets into one interrupt was pretty much standard for modern gigabit interfaces?

Reply to
Walter Roberson

Walter Roberson wrote in part:

This will get around needing jumbo packets, but I'm not sure how "standard" such grouping is. I don't think the Realtek 8169 has it

-- Robert

Reply to
Robert Redelmeier

The 16550A UART did that for serial ports some years ago. It collects characters until a threshold is met (selectable), or a timer goes off (in case no more characters are coming in).

That wouldn't seem hard to do on a NIC.

-- glen

Reply to
glen herrmannsfeldt

It is a little more complex than the UART case, because in the UART case all the characters are going raw to the same routine, but in the NIC case, the multiple packets (likely to different locations) are packed into the same buffer (with some kind of encapsulation to allow the boundaries to be determined by the stack.)

Reply to
Walter Roberson

Ah - we were talking about different things I think - when I said per-packet or per-frame costs it was in the context of CPU consumed, not in the context of header overhead on the wire.

rick jones

Reply to
Rick Jones

It is, at the expense of latency:

ftp://ftp.cup.hp.com/dist/networking/briefs/nic_latency_vs_tput.txt

rick jones

Reply to
Rick Jones

1000ns is equivalent to 125 bytes a gigabit speeds. 125 bytes is a lot less than the max frame size 1500.
Reply to
Marris

Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.