Find duplicate files based on similarity percentage of it’s contents

 

find ./src -type f  -size -100000c  | sim_text -pit 75

Take note! Above similarity comparison is limited to 100k bytes in command above. This is due to performance restrictions. If you include some big text files in similarity comparison it would take very long time. So you need to exclude those files.

You can also reduce similar files search to specific file type for example:

find ./src -type f -name '*.php'  | sim_text -pit 75

this would limit similarity search only to *.php files

to install sim_text command:

sudo apt install similarity-tester