Friday 10 June 2011

FIXED - PEM Email, Hosted Exchange and MS SQL

Dear Customer,

As you will be aware we suffered a serious failure of our mass storage device. This device holds the email accounts and databases for many of our customers. It is built across two separate pieces of hardware and two different controllers to provide a redundant system.

At around 8.50pm on Wednesday evening the primary controller failed, and the secondary controller took over as planned. During the later part of the evening we noticed that the mail storage was not accessible even though the SAN was operational for other services. It transpired that the email storage had not transitioned across from the primary controller to the secondary. All attempts to make this change failed and at 11.15am on Thursday morning a decision was made to reboot the entire SAN to see if it would clear the issue.

What we did not realise was that after the full reboot the second controller would not come back online correctly either. This failure took our Hosted Microsoft Exchange and MS SQL service offline.

Within a couple of hours the decision was made to start making alternative arrangements by putting our disaster recovery program in place. This was first put in to action for MS SQL and customer data was started to be restored as requested and required. At the same time we started a build of an alternative email system to get email flowing as quickly as possible.

We have been working closely with SUN/Oracle, the vendor of the hardware, who have had an engineer working on our system.

At around 11.50am on Friday, all Hosted Exchange and SQL services were functional and working correctly.

The SUN/Oracle engineer continued to work on our storage device and by 2pm had managed to re-establish connection to the email store. Email is now flowing correctly, but as you can imagine service is slightly slower than normal due to the huge amount of email traveling through the system.

During the problems we have had, we continued to update our support news site at - http://sointernetnews.blogspot.com and for those using Twitter - @sointernet

I would like to apologise for the problem and inconvenience that has occurred, which has been the first of this size in 5 years. We believed we had a redundant system that could cope with a failure like this. We will be reviewing our implementation fully over the coming days and adjust our configuration as required.

Thank you again for your patience.

Hosted Exchange - FIXED but at RISK

Hi,

Just to let you know we have now restored our Hosted Exchange service without the need to do a disaster recovery.

We are still deeming this service as at risk as SUN/Oracle engineers are still working on the SAN and so it may be reset at any point while final configuration updates are made.

Paul

PEM Email, Hosted Exchange and MS SQL - Further Update

Hi,

We are making good progress with SUN/Oracle now and just awaiting details of an engineer visit to site to fix the failed equipment.

They are currently running additional diagnostics and we should have an update shortly.

We have just around 10 MS SQL databases to restore and then that service is fully functional.

PEM Hosted Email - We are still working to bring the mail back online and will update as soon as we have a revised time.

Thank you again for your patience.

Paul

PEM Email, Hosted Exchange and MS SQL - Update

Hi All,

We are progressing well on this issue, although it is still taking a long time.

We are still awaiting on spare parts from SUN/Oracle and hope to have a delivery schedule shortly. In the meantime, here is an update on each problem.

Shared MS SQL - This has now been fixed and we are just restoring the databases, which should be finished over the next couple of hours.

Hosted Exchange - We are currently assessing the situation with this and should have a way forward shortly.

PEM Email - This is our major task as it effects a very large number of customers. We are hoping that we will be able to get a solution running again in the next few hours to allow incoming email.

Paul

Thursday 9 June 2011

PEM Email, Hosted Exchange and MS SQL

Hi,

We are continuing to work on the major issue we are experiencing. We are awaiting spare parts from SUN/Oracle and at this time can't move much further forward until we have this new equipment. We are hoping these parts will arrive by the middle of tomorrow.

In the meantime we are rebuilding the MS SQL setup and restoring data from this morning's backups and this should be fixed by first thing in the morning.

On the PEM email side of things, we are currently building a replacement solution and hope that within the first few hours of the working day on Friday we have the ability restored for customers to receive incoming email.

Thank you again for your patience and I apologize for the inconvenience this will have caused you.

Paul

Continued PEM email and Hosted Echange/MS SQL

Hi,

We are working with Oracle to find a solution to the SAN issue that we are having.

We are going to try another couple of resets to see if this will clear the issue.

I will update again shortly.

Paul

SAN Reboot Update

Hi,

The SAN reboot has not helped the problem at the moment, but has identified the cable/controller issue.

We continue to work on this and will update you shortly.

Paul

Emergency SAN reboot

Hi All,

If you receive Hosted Exchange, or shared MS SQL services from us, we will be resetting our SAN at 11.15 this morning.

This reset should only take around 10 minutes and will hopefully clear the mail issue we are having at the same time.

Thank you again.

Paul

PEM Inbound email problems - Update

Hi,

We have now located the issue with the SAN storage.

The problem lies either with one of the fibre cables or one of the controllers.

We are very close to fixing this issue and I will update you shortly.

Thank you again for your patience.

Paul

Wednesday 8 June 2011

PEM Inbound email problems

Hi All,

If your email is hosted in our shared system called PEM, then we are experiencing issues.

We currently have a problem with the way our mail system talks to our storage disks. For some people this is resulting in them not finding any email in their mailboxes.

All new emails are being queued for delivery as soon as we have fixed the issue.

Engineers have been alerted to the problem and we will have a resolution over the coming hours, but this should be at the latest 11am, 9th June 2011

Thank you for your patience.

Paul

Outgoing Mail issue update

Hi all,

We have received a number of calls regarding outgoing mail issues.

We recently injected a new outbound server in to the network to help assist with the increase in outgoing emails. We have noticed an issue with the configuration of this server not accepting incoming mail on port 588. We have now rectified this issue and all services are now running correctly.

Once again thanks for your patience.

Tuesday 7 June 2011

Minor ADSL issue 7th June 2011

Dear all,

We were alerted to an issue a couple of hours ago. This problem stopped some users being able to reconnect to our broadband service.

We have identified the cause of this block,which affected just one of our 3 pieces of termination equipment.

We have just run a script to reset and clean up the user connections and the service should be running correctly.

Please do contact support if you continue to experience issues.

Paul