Sunday, March 18, 2012

Find duplicate files using VB.Net

When you are receiving files from different server hosting but the content is same, there is a possibility of having a different filename for the same content. Hence finding a duplicate file just by file name may not be sufficient. To compare by file data, there are several ways.

To find duplicate files even after renamed, the content/data has to be compared after the content of files fetched. Once the file content is in data format, the data can be encoded with MD5 hash algorithm. The string result after hash can be used for comparing. MD5 is a widely used cryptographic hash function with a 128-bit hash value, and is also commonly used to check the integrity of files

.NET Framework has very rich support for encryptWing and decrypting. Computing hashes and encrypting data using a variety of algorithms is very easy. Use the ComputeHash() method to compute the MD5 Hash.

For the MD5 to work we should give which encoding it should follow, basically we are using ASCIIEncoding. This same function can be used in a recursive call to check all the duplicates. Once all the files in the directory are scanned and compared it is much easier to delete the duplicate files found.

Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
If (CompareFiles("D:\FirstFile.txt", "D:\FirstFile1.txt")) Then
MsgBox("duplicate")
Else
MsgBox("diff")
End If
End Sub
Public Function CompareFiles(ByVal FirstFile As String, _
ByVal SecondFile As String) As Boolean
Return ReadFile(FirstFile) = ReadFile(SecondFile)
End Function
Private Function ReadFile(ByVal Path As String) As String
Dim ReadFileStream As FileStream
Dim FileEncoding As New System.Text.ASCIIEncoding()
Dim FileReader As StreamReader
Dim HashData As New MD5CryptoServiceProvider()
ReadFileStream = New FileStream(Path, FileMode.Open)
FileReader = New StreamReader(ReadFileStream)
Dim FileBytes = FileEncoding.GetBytes(FileReader.ReadToEnd)
Dim FetchedContent = FileEncoding.GetString(HashData.ComputeHash(FileBytes))
FileReader.Close()
ReadFileStream.Close()
Return FetchedContent
End Function

View the original article here

No comments:

Post a Comment