Duplication Detector

Duplication Detector, created for en:Wikipedia:Copyright problems on the English Wikipedia, is a tool used to compare any two web pages to identify text which has been copied from one to the other. Either, neither, or both pages may be current or old revisions of a Wikipedia article. It does normalization followd by exact matching using a simple n-gram word index. Source available under Simplified BSD license. There is some more feature work to be done (approximate matching) but it's mostly done.