FCIP issues with SAN replication

- C
- Chandler Bing
  
  Contact options for registered users
posted
13 years ago

Mon, Jun 28, 2010 6:48 PM

Greetings all,

My company is attempting to perform replication from one HP EVA SAN array to another HP EVA SAN array across the WAN. We have a metro Ethernet connection between the two with one Gigabit of shared bandwidth. We share the bandwidth with our other business units, with no QoS in place, but we have been told that the pipe has never been completely saturated, and we=92re not rate limited. The SAN arrays are on 4Gbps fiber channel brocade switches. There are two devices called MPX110=92s that send the data from fiber channel to Ethernet. Each MPX has redundancy groups they perform replication for, and although they have two Ethernet and two fiber channel ports on each, we only use one on each. Each MPX110 has a path they perform replication for to their counter parts on the other side. It is my understanding they negotiate a tunnel between them, Fiber Channel over IP. They=92re each on their own 6509 which have a uplinks to a 3750 and that goes across the metro Ethernet to a 3560 on the other side, then up to a 3560 acting as the core and out to two 3560=92s with an MPX on each one.

Now the problem, although we have one gigabit of bandwidth, they=92ll only use about 13Mbps of it each, we=92ve verified this with iperf. Each connection we=92ll only take 13Mbps of bandwidth, parallel tests show each connection gets 13Mbps of bandwidth. The HP engineer told us that at >5Mbps we get approximately 1.3Mbps of actually data, which means that FCIP has 80% over head? Can that be right? The big huge problem is that after running for several hours they=92ll eventually just die and have to rebooted to start replicating again. They=92re already on the latest firmware (2.4.4.1). The only error we get from the statistic screen of the MPX=92s says they=92re getting TCP timeouts.

I=92ve performed captures on both sides=92 MPXs=92 and the errors I see in = a

60 sec sample are FCP malformed packets (~4300), duplicate ACK=92s (~41), previous segment lost (~3), fast retransmission (~3). When HP was questioned about the FCP malformed packets they stated that they use a proprietary protocol and that wireshark wouldn=92t be able to decode it. I=92ve since searched for this protocol but can find no references to it anywhere. The other errors seem so minor and few it would be hard to believe that they=92re impacting the data stream that much if at all.

I=92ll include a small sample of the captures, if it lets me.

Thanks in advance for your assistance.

Chandler Bing

- A
- alexd
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Mon, Jun 28, 2010 8:03 PM

Meanwhile, at the comp.dcom.sys.cisco Job Justification Hearings, Chandler Bing chose the tried and tested strategy of:

My first instinct would be to simulate the 1G WAN by bringing the two units together and linking them with a simple gigabit link [or even 100M to simulate a worst case scenario], and working upwards in complexity from there. Easier said than done, of course. When you get it working you can have a look at a packet capture to give you a rough idea of what it should look like.

Rest assured that the second point will be addressed as soon as you address the first one :-)

Vendor support have been known to be correct on occasion so I'll reserve judgement for now.

- B
- bod43
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Jun 29, 2010 2:34 AM

How far apart are the two units being replicated? Minimum ping rtt is the best distance metric.

Reply from 208.69.34.231: bytes=3D32 time=3D49ms TTL=3D56 Reply from 208.69.34.231: bytes=3D32 time=3D51ms TTL=3D56 Reply from 208.69.34.231: bytes=3D32 time=3D48ms TTL=3D56 Reply from 208.69.34.231: bytes=3D32 time=3D48ms TTL=3D56

So in this case the min rtt is 48ms.

This may be relevant:-

formatting link

Look up "bandwidth delay product".

- S
- Stephen
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Tue, Jun 29, 2010 9:30 PM

FWIW FCIP is a standard protocol - if HP have written something non standard then they should have it documented....

Note a sniffer would normally refuse to decode something it doesnt understand, unless whoever wrote the protocol didnt follow whatever escape clauses are built in to allow non standard formats inside the standard wrapper.

If you were being paranoid you would find a HP analyser and see if that shows errors.

a Cisco doc about designing with FCIP

formatting link

the comments about applications being "synchronous" or not.

FC seems to use a guaranteed buffer scheme, where there needs to be enough buffering to cope with the path delay to get wire speed throughput.

i have run into issues with buffer credits in FC switches, where you need enough to cope with the speed / delay .

As OPs have commented - check the timing, The GigE link may not follow a direct route, so you may have more delay than you or the protocols expect.

- C
- Chandler Bing
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Jul 14, 2010 7:04 PM

Bingchose the tried and tested strategy of:

HP did something similiar in a lab, they setup the MPX's on a single switch (No WAN) and had them replicating at break neck speeds. When we pointed out that there was no WAN delay, bandwidth limitations, or other devices in the mix they simply shrugged at us.

We have implemented QoS which seems to have given it additional bandwidth (not sure why). We've seen it climb to 45Mbps, but then fail after 12 hours.

- C
- Chandler Bing
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Jul 14, 2010 7:06 PM

The two FC gateways and SANs are 1300~1500 miles apart, RTT ping times are ~36ms with very little jitter.

Alas, the article you linked has no solution. I found several like this, which we pointed out to HP.

- C
- Chandler Bing
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Jul 14, 2010 7:12 PM

I since gotten Wireshark to decode properly. I had to disable FCP decode.

I suspect that's at least part of our problem, as I believe the SAN replication traffic over FCIP is synchronous. The QoS policy we implemented helped, but still fails with high TCP timer expired error count. I have nothing to correlate this to in my wireshark captuers though.

If the FC switches have a buffer configuration, I'm not familiar with it and I'm relying heavily on HP and our SAN engineer to configure those pieces. It is interesting that we have little to no visbility into the FC switch and any errors occuring on that side. The focus seems to be on the ethernet and network side.

- C
- Chandler Bing
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, Jul 14, 2010 7:12 PM

Now the update:

I discovered if I disable the FCP decode, Wireshark does decode it correctly as FCIP.

We applied a QoS config to flag SAN replication traffic as DSCP EF and have seen consistent ping times of ~36ms between sites and the bandwidth climb as high as 45Mbps on a 1Gbps link. They still fail after replicating for a few hours. Last time we watched them replicate for 12 hours and then fail. The TCP timer exceed counter seems to indicate that is the problem, but I have nothing significant on the wireshark captures to support this.

HP has decided that the MPX110 on the far side needs to be replaced. I'll post an update after that's done.

- B
- bod43
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, Jul 15, 2010 2:16 AM

Well. As expected, I guess your issues are likely being caused by the round trip delay.

HP have been pretty lame in re-creating your environment since they have failed to emulate the RTT delay.

Perhaps they could consider using nistnet? This is a WAN emulator which runs on linux and very nicely creates a WAN in a Lab.

Obviously you will need a decent computer to run nistnet if you want GBE performance. I last used it about 4 years ago and ran it on an HP DL 360 G3 server using the two built in GBE interfaces. I forget now the bandwidth we needed but it may well have been GBE.

I see that there now seems to be something called WANem mentioned in the googlesphere. Might be worth a look. Nistnet for sure works a treat but has been around for years 'n' years and needs a bit of linux knowledge to get to go.

I have never used FC in anger and don't know much about it however my guess is that there is a VERY good reason that is is not that widely deployed. Everything I have ever heard about it makes the hairs on the back of my neck tingle alarmingly. For example, I would bet that Google don't use it AT ALL in any of their data centres.

Of course loads of people use FC, but loads of people used token ring too despite it being completely crap.

If you like I will consult on this for you and get to the bottom of it. Email in headers does work although I don't read it very often and it fills up with spam. Send me mail and perhaps post here mentioning when the mail was sent in case I miss it among the junk.

OK let me explain. TR was crap because every single conputer in the ring had to correctly process every single bit communicated. A single bit error by ANY single computer in the ring resulted in a lost packet. Twisted pair (or of course yellow "garden hose") Ethernet is a hundred of times more robust. Token ring relied on Chinese whispers in order to work at all. It was simply a daft idea driven by effective sales and marketing. No one ever got fired for buying IBM:)

- J
- John CG
  
  Contact options for registered users
Vote on answer
posted
8 years ago

Wed, Sep 9, 2015 6:53 AM

replying to Chandler Bing, John CG wrote: Hi! Reading all the dramas on this thread is kind of entertaining.

Seriously, this thread is 5 year old. Now we have 2G FC disks for 10 dollars each. I have done hard drive speed test, and these 10-dollar disks sends data easily over 100MBytes per second over a fiber link to my linux box, 800Mbps to saturate your 1G FCIP link.

Why didn't you first take out the FC switches and test the raw 1Gbps link? The test will need 2 of the 10-dollar hard drives and 2 linux computers each with a QLogic adapter. Qlogic adapters are 5 dollars each. You should be able to copy a

300GB disk using ftp over that 1Gbps in about 3000 seconds, close to or less than 1 hour, about 100MBytes, 800Mbps per second, same performance as a home linux box.

If that could not be achieved, the FCIP link is the first problem to solve. If that could be achieved, then move your disks to the home-grade linux boxes is the solution.