Duplicate Finder is a simple utility to look though files and find identical lines, which indicates duplicate code or cut-and-paste coding. It is written in C# using some .net 2.0 features.
It is aimed at detecting duplicate statements in source code, but uses simple text comparison techniques that should work on other kinds of text files.
I wrote this program for people like myself: C# programmers with an interest in code quality and automated tools to find code quality problems. However you may find other uses for it.
The source code program consists of:
DuplicateFinderLib - the engine for finding duplicates
DupFinder.exe - the command-line tool that uses the engine
DuplicateFinder.Tasks - the MSBuild task for the engine. This allows integration with C# automated builds.
DuplicateFinder.TestLibrary unit test cases using NUnitDuplicateFinder 1.5
Usage of the command-line tool is as follows, eg:
>DupFinder.exe -t4 test5*.txt
Processing in C:\Code\DuplicateFinder\TestData
2 files read
Duplicate of length 5 at:
Line 2-6 in C:\Code\DuplicateFinder\TestData\Test5Lines1.txt
Line 2-6 in C:\Code\DuplicateFinder\TestData\Test5Lines2.txt
1 duplicate found
A more realistic example for C# code, looking through all files in the source tree, for duplicates of 9 lines or more, excluding the generated files called AssemblyInfo.cs:
> C:\Code\DuplicateFinder>DupFinder.exe -t9 -r -eAssemblyInfo.cs *.cs
Processing in C:\Code\DuplicateFinder
11 files read
Duplicate of length 11 at:
Line 1-11 in C:\Code\DuplicateFinder\TestLibrary\TestAllFiles.cs
Line 1-11 in C:\Code\DuplicateFinder\TestLibrary\TestFiles.cs
Line 1-11 in C:\Code\DuplicateFinder\TestLibrary\TestProgramFile.cs
1 duplicate found
The duplicate finder has found that the test cases have the same using and namespace lines at the top.Duplicate Finder Commandline arguments explained
The MSBuild Task