Upgrade your NetApp DataONTAP! (7.3.3P2/P4)

Are you a NetApp customer?

Do you know which version of DataONTAP your storage system(s) are running?

As of today, we are currently running on 7.3.1.1P8. This system had 337 days of uptime. I was anxiously awaiting my 1 year milestone, as some of you may have seen me posting about it on Twitter. Some of the NetApp crowd was also involved in sharing/bragging about this. I also did not want to update it prior to being gone nearly the entire month of September for VMworld and Oracle OpenWorld.

Enter BUG #332110: Driver refresh for X1008, X1010 dual port 10G ethernet card

***If you're a NetApp customer, you can clickhereto see the full report

Driver refresh from vendor to fix known problems in both hardware and software.

Vendor found a problem with the Media Access Control(MAC) in the T3B2(not in T3C) revision of the

chip. Only X1107 use T3C in NetApp.

When the MAC is under high load, it could get into a mode that does nothing except transmitpause frame. It would not even forward received traffic up to the host. Only reboot the filer could get the MAC out of that mode. The driver refresh include a work around to detect and reset the MAC portion of the chip at run time.

Internal MAC flowcontrol is enabled in the refreshed driver. A port of that that was ported to the old

driver to fix 313558.

The support for X1106/X1107 are added in this driver.

So begins our story...

Friday night (technically Saturday morning), I get a call from a DBA around 3AM that says, "Hey, Oracle can't access it's NFS mounts..."

sigh "OK, let me check..."

Dig a little deeper, call our Network Engineer, and then see that ALL vm's are disconnected. Oh dear.

Dig a little deeper, and see that NO 10GbE traffic is passing to the NetApp. Oh dear.

We had hit the bug. And the really bad part? It didn't actually take the filer or the interfaces down/offline, so cluster failover didn't take place.

What did this do for us? OPENED OUR EYES! We need to keep up with patches/upgrades better. We need a physical Domain Controller in place, because when the filer tried to come back up, it couldn't find our virtual domain controllers that were offline, and therefore, the authentication part of the "giveback" process failed.

Sidenote: "Dear NetApp, please give us some flexibility on the Active Directory integration. I've got dozens of Domain Controllers all around the country, but the installation only looks in the local site/subnet where the filer resides. Had it traversed outside, it would have found MANY online DC's."

We need to configure more granular things to monitor latency to our storage systems, because technically, it never went down.

Lastly, I had made some configuration changes to our VIF configuration over the course of 337 days of uptime. What I didn't realize is that those changes were never written to the RC file, which is what initializes and creates all of your VIFs on boot. So, we came back up on a single interface. Essentially, our entire company is running on a single 10GbE interface now. A decision was made to get the company back up and online rather than to try and troubleshoot/reconfigure everything, and that we would deal with this in a later planned downtime.

Tomorrow night, we now have to have an additional 2-4 hour planned emergency outage to correct all of these things? Why? Because I didn't stay on top of my upgrades, and didn't stay on top of my configs.

Lesson learned? You bet.

If you're using NetApp storage with any of the bleeding edge stuff like 10GbE/FCoE expansion cards, keep your stuff up to date. It is bound to have bugs, all software does.

-Nick

Upgrade your NetApp DataONTAP! (7.3.3P2/P4)

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112