Using writev() vectored write system call with VxFS can cause silent data corruption

Using writev() vectored write system call with VxFS can cause silent data corruption

  • Article ID:100033372
  • Last Published:
  • Product(s):InfoScale & Storage Foundation

Problem

The writev() system call can be used to issue a vectored write allowing the bundling of multiple write vectors into a single atomic write.

When the writev() call is used with a VxFS filesystem it can result in null data being written to the file.

Error Message

There is no error message, the data corruption is silent and can only be detected by checking the data written to disk.

Cause

If two or more iovec-writes are to be written in which either one of these iovecs, after the first one, fails, it is possible for null data to be written as this iovec write finds the source page is no longer in memory. This causes all the previous writes to be undone but only the failed iovec write is accounted for, causing the area of partially written pages to be filled with nulls.
 
For this problem to occur the writes should span across pages.

Applications that use page aligned vectored writes with multiples of 4K in size, will not be affected. i.e If vectored writes are using a block size of 4K or 8K etc.

Solution

The issue only effects linux platforms and only when using the writev() system call with I/O block sizes, that are not multiples of the page size (4K).

A process can be checked if it uses the writev() system call with strace. In the following example the program 'test_program' opens file descriptor 4 against file '/vol1/test_file'. Then it issues a writev() call on file descriptor 4:

 strace -f -e writev,open ./test_program


14610 open("/vol1/test_file", O_RDWR|O_CREAT|O_TRUNC, 0666) = 4
14610 writev(4, [{"\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365\365"..., 2048}, {"\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367\367"..., 6144}], 2) = 8192

Alternatively contact your application vendor to see if vectored writes (writev()) are used in a way that could be affected by this issue.

It should be noted that neither DB2 or Oracle are affected by this issue.

Oracle does not make use of vectored writes and always makes sure to do properly aligned writes.

Whilst DB2 does make use of writev, it makes use of it a way that is not affected by this issue.

The issue is fixed in the following patch and hotfix releases. The table will be updates as fixes become available. Please contact support for the private hotfix until such time as a public patch is available.
 
Affected VxFS versions Required Public GA VxFS patches Private Hot-fixes
6.0   6.0.5.406
6.1   6.1.1.402
6.2  

6.2.1.111 or 6.2.1.301 or    6.2.1.701

7.0   7.0.1.003
7.1   7.1.0.005 or
7.1.0.101
7.2   7.2.0.002

 

Was this content helpful?