A common mistake usually made by users (without much experience) is when transferring binary files via ftp. If the binary option is not selected then the binary file will be transfered using text/ascii mode. This results to the corruption of the destination binary file. Imagine the destructiveness of this simple mistake when you have to do with backups that could not be transfered again (in binary mode). You will end up with corrupted backup data and you would pray not to have done this mistake!
I recently faced such a problem with tar.gz backup files stored in an external NTFS hard drive. The data was transfered there via ftp (unfortunately in ascii mode). The server with the original data (Centos 5) had problems with hard disks combined using RAID technology. One of the two disks was irreversibly damaged.
So, i tried to recover the data from the backup drive. I tried to issue the usual command for uncompressing and extracting the data.
- $ tar zxvf mysql_backup.tar.gz
The output was giving a serious error of gzip.
- gzip: stdin: invalid compressed data--format violated
- tar: Unexpected EOF in archive
- tar: Error is not recoverable: exiting now
The above not recoverable error gave me the creeps. After googling around i found that is very difficult task to recover those corrupted files because the FTP ASCII mode transfer destroys data by transforming dissimilar original bytes to identical output values. Many of the suggesting solutions was to transfer again the file in binary mode, something impossible since the original data were lost. The only vital suggestions to recover the data was by implementing a recovery program which will recognize and fix the corruption. An algorithm suggested was to open the corrupted file as a byte stream and detect and remove all the carriage returns (CR) followed by linefeed (LF). Taking into account that the number of possible valid CR (0d) LF (0a) byte pairs in a binary will not be very high; the probability of a binary having a valid CR (0d) LF (0a) pair is quite low. Therefore, you have a high probability to save your data.
Finally, i didn't implement such an algorithm (which is quite easy), since i found an implementation in c in the official site of the gzip. The offer a program called fixgz to remove the extra CR (carriage return) bytes inserted by the transfer. However, there is absolutely no guarantee that this will actually fix your file. Despite the no guarantee warning, i got my entire data backup (which included images, files and sql backups in tar.gz files) by doing the following:
- $ wget http://www.gzip.org/fixgz.zip
- $ unzip fixgz.zip
- $ gcc -o fixgz fixgz.c
- $ fixgz corrupted_backup.tar.gz fixed_backup.tar.gz
- $ tar zxvf fixed_backup.tar.gz
And there it work like a charm! The data were extracted 100% successfully. I wanted to share my experience, because it probably will be useful to one of you desperate with this ftp ascii thing out there. Conclusion, always enable the binary mode when you deal with binary files in ftp transfers