module filehelper.synchelper

Short summary

module pyquickhelper.filehelper.synchelper

Series of functions related to folder, explore, synchronize, remove (recursively).

source on GitHub



truncated documentation


Same as explore_folder() but iterates on files included in a folder and its subfolders.


Returns the list of files included in a folder and its subfolders. Returned names can be modified if sub_pattern is …


Same as explore_folder() but iterates on files included in a folder and its subfolders.


Returns all files present in folder and added to a SVN or GIT repository.


It assumes dest is a copy of source, it wants to know if the copy is up to date or not.


Removes everything in folder top.


Synchronizes two folders (or copy if the second is empty), it only copies more recent files. It can walk through …


Does the same as os.walk plus does not go through a sub-folder if this one is big. Folders such build


Series of functions related to folder, explore, synchronize, remove (recursively).

source on GitHub

pyquickhelper.filehelper.synchelper.download_urls_iterfile(folder, pattern=None, neg_pattern=None, fullname=False, recursive=True)[source]

Same as explore_folder but iterates on files included in a folder and its subfolders.

  • folder – folder

  • pattern – if None, get all files, otherwise, it is a regular expression, the filename must verify (with the folder is fullname is True)

  • neg_pattern – negative pattern to exclude files

  • fullname – if True, include the subfolder while checking the regex

  • recursive – look into subfolders


iterator on files

source on GitHub

pyquickhelper.filehelper.synchelper.explore_folder(folder, pattern=None, neg_pattern=None, fullname=False, return_only=None, recursive=True, sub_pattern=None, sub_replace=None, fLOG=None)[source]

Returns the list of files included in a folder and its subfolders. Returned names can be modified if sub_pattern is specified.

  • folder – (str) folder

  • pattern – (str) if None, get all files, otherwise, it is a regular expression, the filename must verify (with the folder if fullname is True)

  • neg_pattern – (str) negative pattern

  • fullname – (bool) if True, include the subfolder while checking the regex (pattern)

  • return_only – (str) to return folders and files (=None), only the files (=’f’) or only the folders (*=’d’)

  • recursive – (bool) look into subfolders

  • sub_pattern – (str) replacements pattern, the output is then modified accordingly to this regular expression

  • sub_replace – (str) if sub_pattern is specified, this second pattern specifies how to replace

  • fLOG – (fct) logging function


(list, list), a list of folders, a list of files (the folder is not included the path name)

Explore the content of a directory

The command calls function explore_folder and makes the list of all files in a directory or all folders. Example:

python -m pyquickhelper ls -f _mynotebooks -r f -p .*[.]ipynb -n checkpoints -fu 1

It works better with chrome. An example to change file names:

python -m pyquickhelper ls -f myfolder -p .*[.]py -r f -n pycache -fu 1 -s test_(.*) -su unit_\1

Or another to automatically create git commands to rename files:

python -m pyquickhelper ls -f _mynotebooks -r f -p .*[.]ipynb -s "(.*)[.]ipynb" -su "git mv \1.ipynb \1~.ipynb"


python -m pyquickhelper ls --help


usage: ls [-h] [-f FOLDER] [-p PATTERN] [-n NEG_PATTERN] [-fu FULLNAME]

Returns the list of files included in a folder and its subfolders. Returned
names can be modified if *sub_pattern* is specified.

optional arguments:
  -h, --help            show this help message and exit
  -f FOLDER, --folder FOLDER
                        (str) folder (default: None)
  -p PATTERN, --pattern PATTERN
                        (str) if None, get all files, otherwise, it is a
                        regular expression, the filename must verify (with the
                        folder if fullname is True) (default: )
  -n NEG_PATTERN, --neg_pattern NEG_PATTERN
                        (str) negative pattern (default: )
  -fu FULLNAME, --fullname FULLNAME
                        (bool) if True, include the subfolder while checking
                        the regex (pattern) (default: False)
  -r RETURN_ONLY, --return_only RETURN_ONLY
                        (str) to return folders and files (*=None*), only the
                        files (*='f'*) or only the folders (*='d') (default: )
  -re RECURSIVE, --recursive RECURSIVE
                        (bool) look into subfolders (default: True)
  -s SUB_PATTERN, --sub_pattern SUB_PATTERN
                        (str) replacements pattern, the output is then
                        modified accordingly to this regular expression
                        (default: )
  -su SUB_REPLACE, --sub_replace SUB_REPLACE
                        (str) if sub_pattern is specified, this second pattern
                        specifies how to replace (default: )

source on GitHub

pyquickhelper.filehelper.synchelper.explore_folder_iterfile(folder, pattern=None, neg_pattern=None, fullname=False, recursive=True, verbose=False)[source]

Same as explore_folder but iterates on files included in a folder and its subfolders.

  • folder – folder

  • pattern – if None, get all files, otherwise, it is a regular expression, the filename must verify (with the folder is fullname is True)

  • neg_pattern – negative pattern to exclude files

  • fullname – if True, include the subfolder while checking the regex

  • recursive – look into subfolders

  • verbose – use :epkg:`tqdm` to display a progress bar


iterator on files

source on GitHub

pyquickhelper.filehelper.synchelper.explore_folder_iterfile_repo(folder, log=<function fLOG>)[source]

Returns all files present in folder and added to a SVN or GIT repository.

  • folder – folder

  • log – log function



source on GitHub

pyquickhelper.filehelper.synchelper.has_been_updated(source, dest)[source]

It assumes dest is a copy of source, it wants to know if the copy is up to date or not.

  • source – filename

  • dest – copy


True,reason or False,None

source on GitHub

pyquickhelper.filehelper.synchelper.remove_folder(top, remove_also_top=True, raise_exception=True)[source]

Removes everything in folder top.

  • top – path to remove

  • remove_also_top – remove also root

  • raise_exception – raise an exception if a file cannot be remove


list of removed files and folders –> list of tuple ( (name, “file” or “dir”) )

source on GitHub

pyquickhelper.filehelper.synchelper.synchronize_folder(p1: str, p2: str, hash_size=1048576, repo1=False, repo2=False, size_different=True, no_deletion=False, filter: [<class 'str'>, typing.Callable[[str], str], None] = None, filter_copy: [<class 'str'>, typing.Callable[[str], str], None] = None, avoid_copy=False, operations=None, file_date: str = None, log1=False, copy_1to2=False, create_dest=False, fLOG=<function fLOG>)[source]

Synchronizes two folders (or copy if the second is empty), it only copies more recent files. It can walk through a git repository or SVN.

  • p1 – (str) first path

  • p2 – (str) second path

  • hash_size – (bool) to check whether or not two files are different

  • repo1 – (bool) assuming the first folder is under SVN or GIT, it uses pysvn to get the list of files (avoiding any extra files)

  • repo2 – (bool) assuming the second folder is under SVN or GIT, it uses pysvn to get the list of files (avoiding any extra files)

  • size_different – (bool) if True, a file will be copied only if size are different, otherwise, it will be copied if the first file is more recent

  • no_deletion – (bool) if a file is found in the second folder and not in the first one, if will be removed unless no_deletion is True

  • filter – (str) None to accept every file, a string if it is a regular expression, a function for something more complex: function (fullname) --> True (every file is considered in lower case), (use

  • filter_copy – (str) None to accept every file, a string if it is a regular expression, a function for something more complex: function (fullname) –> True

  • avoid_copy – (bool) if True, just return the list of files which should be copied but does not do the copy

  • operations – if None, this function is called the following way operations(op, n1, n2) if should return True if the file was updated

  • file_date – (str) filename which contains information about when the last sync was done

  • log1FileTreeNode

  • copy_1to2 – (bool) only copy files from p1 to p2

  • create_dest – (bool) create destination directory if not exist

  • fLOG – logging function


list of operations done by the function, list of 3-uple: action, source_file, dest_file

if file_date is mentioned, the second folder is not explored. Only the modified files will be taken into account (except for the first sync).

synchronize two folders

The following function synchronizes a folder with another one on a USB drive or a network drive. To minimize the number of access to the other location, it stores the status of the previous synchronization in a file (status_copy.txt in the below example). Next time, the function goes through the directory and sub-directories to synchronize and only propagates the modifications which happened since the last modification. The function filter_copy defines what file to synchronize or not.

def filter_copy(file):
    return "_don_t_synchronize_" not in file

synchronize_folder( "c:/mydata",
                    hash_size = 0,
                    filter_copy = filter_copy,
                    file_date = "c:/status_copy.txt")

The function is able to go through 90.000 files and 90 Gb in 12 minutes (for an update).

source on GitHub

pyquickhelper.filehelper.synchelper.walk(top, onerror=None, followlinks=False, neg_filter=None)[source]

Does the same as os.walk plus does not go through a sub-folder if this one is big. Folders such build or Debug or Release may not need to be dug into.

  • top – folder

  • onerror – see os.walk

  • followlinks – see os.walk

  • neg_filter – filtering, a string, every folder verifying the filter will be excluded (file pattern, not a regular expression pattern)


see os.walk

source on GitHub