Moad Computer, the actionable insights company
  • Home
  • Contact
  • Shop
  • Blog
  • Home
  • Contact
  • Shop
  • Blog
Search

Actionable Insights blog




A simple duplicate files checker

5/31/2022

0 Comments

 
Picture

Dr. Rahul Remanan

CEO, Moad Computer

An example implementation of duplicate file detection using Python. This could be used as the backbone for a de-duplicated file system.
Open in Kaggle
Import libraries
Open in Kaggle
Compute file hashes
The file hashes are computed for a specified chunk size using either SHA256 or Blake cryptographic functions using the hashlib python library.
Open in Kaggle
Detect file duplicates
Creates a dictionary output with the cryptographic hash as the key and a list of files that share that specific cryptographic hash as the value.
Open in Kaggle
Find duplicates in a list of files
​Finding the duplicate files can be performed by simply iterating over all the keys in the file comparison dictionary, looking for values with a list size of more than 1.
Open in Kaggle
References:
1.  Explainer on Python super() by Raymond Hettinger
0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Overview

    Moad Computer is an actionable insights firm. We provide enterprises with end-to-end artificial intelligence solutions. Actionable Insights blog is a quick overview of things we are most excited about.

    Archives

    November 2022
    October 2022
    September 2022
    August 2022
    July 2022
    June 2022
    May 2022
    April 2022
    March 2022
    February 2022
    January 2022
    December 2021
    November 2021
    October 2021
    September 2021
    August 2021
    July 2021
    June 2021
    May 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    November 2020
    October 2020
    May 2020
    April 2020

    Categories

    All

    RSS Feed

Location

Our mission:

Cutting edge, insightful analytics using AI, for everyone.

Contact Us

    Subscribe Today!

Submit
  • Home
  • Contact
  • Shop
  • Blog