I/O Error???

Last Post 31 Jan 2008 01:46 AM by nbkr3bi. 7 Replies.
AddThis - Bookmarking and Sharing Button Printer Friendly
  •  
  •  
  •  
  •  
  •  
Sort:
PrevPrev NextNext
You are not authorized to post a reply.
Author Messages
nbkr3bi
New Member
New Member

--
21 Jan 2008 12:25 AM
We have a database maintenance job that is scheduled to run everyday.
It backs up all the databases on the server. Additionally, Performance maintenence happens on saturdays in the same job.
The job has been failing since the last couple of days on Saturdays.
On further analysis, we found the following errors in the Sql server log. It looks to be a hardware I/O(hard disk) issue but we cant figure out why the failure is not happening on other days(besides saturday).
Could someone help us interpret the log?

2008-01-20 01:27:30.05 server SQL Server terminating because of system shutdown.

2008-01-20 01:27:31.48 spid53 BackupMedium::ReportIoError: write failure on backup device 'VDI_BECDF8AB-F059-45B4-8323-2A9F4DB23D62_0'. Operating system error 6(The handle is invalid.).

2008-01-20 01:27:31.51 spid53 LogEvent: Failed to report the current event. Operating system error = 1717(The interface is unknown.).

2008-01-20 01:27:31.51 backup BACKUP failed to complete the command BACKUP database [AAA] TO VIRTUAL_DEVICE=....

2008-01-20 01:27:31.51 spid53 Internal I/O request 0x1A757740: Op: Write, pBuffer:
0x0F020000, Size: 1048576,...

2008-01-20 01:27:31.51 spid53 BackupMedium::ReportIoError: write failure on backup device 'VDI_BECDF8AB-F059-45B4-8323-2A9F4DB23D62_3'. Operating system error 6(The handle is invalid.).

2008-01-20 01:27:31.52 spid53 Internal I/O request 0x4C8C3D08: Op: Write, pBuffer: 0x0EC20000, Size: 327680, ...

2008-01-20 01:27:33.10 spid53 Internal I/O request 0x4C1F2050: Op: Write, pBuffer: 0x0F520000, Size: 1048576,...

2008-01-20 01:27:33.10 spid53 BackupMedium::ReportIoError: write failure on backup device 'VDI_BECDF8AB-F059-...

2008-01-20 01:27:33.15 spid53 Internal I/O request 0x4C17BB60: Op: Write, pBuffer: 0x0E520000, Size: 1048576,...

2008-01-20 01:27:33.15 spid53 BackupMedium::ReportIoError: write failure on backup device 'VDI_BECDF8AB-F059-...

2008-01-20 01:27:33.57 spid53 BackupVirtualDeviceFile::RequestDurableMedia: Flush failure on backup device 'V...

2008-01-20 01:27:33.60 spid53 BackupVirtualDeviceFile::RequestDurableMedia: Flush failure on backup device 'V...

2008-01-20 01:27:33.63 spid53 BackupVirtualDeviceFile::RequestDurableMedia: Flush failure on backup device 'V...

2008-01-20 01:27:33.66 spid53 BackupVirtualDeviceFile::RequestDurableMedia: Flush failure on backup device 'V...


Thank you.
SQLUSA
New Member
New Member

--
21 Jan 2008 03:21 AM
Most important: What is difference between yesterday and today? What changed? What is different during weekends?

Likely causes:

1. Out of disk space
2. Network error
3. A program kicks in like anti-virus

Kalman Toth - Database, Data Warehouse & Business Intelligence Architect
SQLUSA: http://www.sqlusa.com/order2005grandslam/ The Best SQL Server 2005 Training in the World!
SQLUSA
New Member
New Member

--
21 Jan 2008 03:23 AM
sorry, left out hardware failure as a distant possibility, but cannot be completely dismissed
SQLUSA
New Member
New Member

--
21 Jan 2008 06:22 AM
>if only happens on Saturday, can probably rule out hardware issue

Good thinking Watson!

Let's scratch hardware and address the issue WHAT IS DIFFERENT?

Kalman Toth - Database, Data Warehouse & Business Intelligence Architect
SQLUSA: http://www.sqlusa.com/order2005repo...gservices/ The Best SQL Server 2005 Training in the World!
nbkr3bi
New Member
New Member

--
22 Jan 2008 08:28 PM
I checked the windows eventlog. I see lot of errors for "NetBackup"

Some of them read :
TLD(0) [4684] timed out after waiting 311 seconds for ready, drive 9
TLD(0) drive 9 (device 8) is being DOWNED, status: Unable to open drive
Check integrity of the drive, drive path, and media
Error: 7105, Severity: 22, State: 6
Drive Sw70RDNUtility1L1_00 has been disabled: It is no longer responding to requests from this system.
TLD(0) unavailable: initialization failed: Control daemon connect or protocol error

There are also a few for "MSSQLSERVER":
Page (1:4733697), slot 31 for text, ntext, or image node does not exist.
18278 :
Database log truncated: Database: AAAAAAAAA.
Unable to read local eventlog (reason: The event log file has changed between read operations).

Can anyone help in understanding these errors?
SQLUSA
New Member
New Member

--
22 Jan 2008 08:54 PM
Is that a tape drive?

Have you tried a new tape drive? Check with the netbackup folks?

What else are you doing Saturday which you are not doing on Friday for example?

Kalman Toth - Database, Data Warehouse & Business Intelligence Architect
SQLUSA: http://www.sqlusa.com/order2005grandslam/ The Best SQL Server 2005 Training in the World!
nbkr3bi
New Member
New Member

--
31 Jan 2008 01:46 AM
As it turns out, there were some consistency and reference errors in one of the tables which was causing the DBCC CHECKDB to fail and thereby failing the job too. Running

DBCC CHECKTABLE with ,REPAIR_ALLOW_DATA_LOSS fixed the problem.

Thanks to everyone who helped.
nbkr3bi
New Member
New Member

--
31 Jan 2008 06:22 PM
Thanks guys.
We did try REPAIR_FAST and REPAIR_REBUILD but they didnt help. Also the contingency back also reported the same errors so it wasnt possible to restore.
Hence we decided to go for REPAIR_ALLOW_DATA_LOSS. However, there was no change in the row counts of the table before and after running it. Could there be any other kind of data loss??
You are not authorized to post a reply.

Acceptable Use Policy