module filehelper.synchelper

Short summary

module pyquickhelper.filehelper.synchelper

Series of functions related to folder, explore, synchronize, remove (recursively).

source on GitHub

Functions

function

truncated documentation

download_urls_iterfile

Same as explore_folder() but iterates on files included in a folder and its subfolders.

explore_folder

Returns the list of files included in a folder and its subfolders. Returned names can be modified if sub_pattern is …

explore_folder_iterfile

Same as explore_folder() but iterates on files included in a folder and its subfolders.

explore_folder_iterfile_repo

Returns all files present in folder and added to a SVN or GIT repository.

has_been_updated

It assumes dest is a copy of source, it wants to know if the copy is up to date or not.

remove_folder

Removes everything in folder top.

synchronize_folder

Synchronizes two folders (or copy if the second is empty), it only copies more recent files. It can walk through …

walk

Does the same as os.walk plus does not go through a sub-folder if this one is big. Folders such build

Documentation

Series of functions related to folder, explore, synchronize, remove (recursively).

source on GitHub

pyquickhelper.filehelper.synchelper.download_urls_iterfile(folder, pattern=None, neg_pattern=None, fullname=False, recursive=True)[source]

Same as explore_folder but iterates on files included in a folder and its subfolders.

Parameters:
  • folder – folder

  • pattern – if None, get all files, otherwise, it is a regular expression, the filename must verify (with the folder is fullname is True)

  • neg_pattern – negative pattern to exclude files

  • fullname – if True, include the subfolder while checking the regex

  • recursive – look into subfolders

Returns:

iterator on files

source on GitHub

pyquickhelper.filehelper.synchelper.explore_folder(folder, pattern=None, neg_pattern=None, fullname=False, return_only=None, recursive=True, sub_pattern=None, sub_replace=None, fLOG=None)[source]

Returns the list of files included in a folder and its subfolders. Returned names can be modified if sub_pattern is specified.

Parameters:
  • folder – (str) folder

  • pattern – (str) if None, get all files, otherwise, it is a regular expression, the filename must verify (with the folder if fullname is True)

  • neg_pattern – (str) negative pattern

  • fullname – (bool) if True, include the subfolder while checking the regex (pattern)

  • return_only – (str) to return folders and files (=None), only the files (=’f’) or only the folders (*=’d’)

  • recursive – (bool) look into subfolders

  • sub_pattern – (str) replacements pattern, the output is then modified accordingly to this regular expression

  • sub_replace – (str) if sub_pattern is specified, this second pattern specifies how to replace

  • fLOG – (fct) logging function

Returns:

(list, list), a list of folders, a list of files (the folder is not included the path name)

Explore the content of a directory

The command calls function explore_folder and makes the list of all files in a directory or all folders. Example:

python -m pyquickhelper ls -f _mynotebooks -r f -p .*[.]ipynb -n checkpoints -fu 1

It works better with chrome. An example to change file names:

python -m pyquickhelper ls -f myfolder -p .*[.]py -r f -n pycache -fu 1 -s test_(.*) -su unit_\1

Or another to automatically create git commands to rename files:

python -m pyquickhelper ls -f _mynotebooks -r f -p .*[.]ipynb -s "(.*)[.]ipynb" -su "git mv \1.ipynb \1~.ipynb"

<<<

python -m pyquickhelper ls --help

>>>

usage: ls [-h] [-f FOLDER] [-p PATTERN] [-n NEG_PATTERN] [-fu FULLNAME]
          [-r RETURN_ONLY] [-re RECURSIVE] [-s SUB_PATTERN] [-su SUB_REPLACE]

Returns the list of files included in a folder and its subfolders. Returned
names can be modified if *sub_pattern* is specified.

optional arguments:
  -h, --help            show this help message and exit
  -f FOLDER, --folder FOLDER
                        (str) folder (default: None)
  -p PATTERN, --pattern PATTERN
                        (str) if None, get all files, otherwise, it is a
                        regular expression, the filename must verify (with the
                        folder if fullname is True) (default: )
  -n NEG_PATTERN, --neg_pattern NEG_PATTERN
                        (str) negative pattern (default: )
  -fu FULLNAME, --fullname FULLNAME
                        (bool) if True, include the subfolder while checking
                        the regex (pattern) (default: False)
  -r RETURN_ONLY, --return_only RETURN_ONLY
                        (str) to return folders and files (*=None*), only the
                        files (*='f'*) or only the folders (*='d') (default: )
  -re RECURSIVE, --recursive RECURSIVE
                        (bool) look into subfolders (default: True)
  -s SUB_PATTERN, --sub_pattern SUB_PATTERN
                        (str) replacements pattern, the output is then
                        modified accordingly to this regular expression
                        (default: )
  -su SUB_REPLACE, --sub_replace SUB_REPLACE
                        (str) if sub_pattern is specified, this second pattern
                        specifies how to replace (default: )

source on GitHub

pyquickhelper.filehelper.synchelper.explore_folder_iterfile(folder, pattern=None, neg_pattern=None, fullname=False, recursive=True, verbose=False)[source]

Same as explore_folder but iterates on files included in a folder and its subfolders.

Parameters:
  • folder – folder

  • pattern – if None, get all files, otherwise, it is a regular expression, the filename must verify (with the folder is fullname is True)

  • neg_pattern – negative pattern to exclude files

  • fullname – if True, include the subfolder while checking the regex

  • recursive – look into subfolders

  • verbose – use :epkg:`tqdm` to display a progress bar

Returns:

iterator on files

source on GitHub

pyquickhelper.filehelper.synchelper.explore_folder_iterfile_repo(folder, log=<function fLOG>)[source]

Returns all files present in folder and added to a SVN or GIT repository.

Parameters:
  • folder – folder

  • log – log function

Returns:

iterator

source on GitHub

pyquickhelper.filehelper.synchelper.has_been_updated(source, dest)[source]

It assumes dest is a copy of source, it wants to know if the copy is up to date or not.

Parameters:
  • source – filename

  • dest – copy

Returns:

True,reason or False,None

source on GitHub

pyquickhelper.filehelper.synchelper.remove_folder(top, remove_also_top=True, raise_exception=True)[source]

Removes everything in folder top.

Parameters:
  • top – path to remove

  • remove_also_top – remove also root

  • raise_exception – raise an exception if a file cannot be remove

Returns:

list of removed files and folders –> list of tuple ( (name, “file” or “dir”) )

source on GitHub

pyquickhelper.filehelper.synchelper.synchronize_folder(p1: str, p2: str, hash_size=1048576, repo1=False, repo2=False, size_different=True, no_deletion=False, filter: [<class 'str'>, typing.Callable[[str], str], None] = None, filter_copy: [<class 'str'>, typing.Callable[[str], str], None] = None, avoid_copy=False, operations=None, file_date: str = None, log1=False, copy_1to2=False, create_dest=False, fLOG=<function fLOG>)[source]

Synchronizes two folders (or copy if the second is empty), it only copies more recent files. It can walk through a git repository or SVN.

Parameters:
  • p1 – (str) first path

  • p2 – (str) second path

  • hash_size – (bool) to check whether or not two files are different

  • repo1 – (bool) assuming the first folder is under SVN or GIT, it uses pysvn to get the list of files (avoiding any extra files)

  • repo2 – (bool) assuming the second folder is under SVN or GIT, it uses pysvn to get the list of files (avoiding any extra files)

  • size_different – (bool) if True, a file will be copied only if size are different, otherwise, it will be copied if the first file is more recent

  • no_deletion – (bool) if a file is found in the second folder and not in the first one, if will be removed unless no_deletion is True

  • filter – (str) None to accept every file, a string if it is a regular expression, a function for something more complex: function (fullname) --> True (every file is considered in lower case), (use re.search)

  • filter_copy – (str) None to accept every file, a string if it is a regular expression, a function for something more complex: function (fullname) –> True

  • avoid_copy – (bool) if True, just return the list of files which should be copied but does not do the copy

  • operations – if None, this function is called the following way operations(op, n1, n2) if should return True if the file was updated

  • file_date – (str) filename which contains information about when the last sync was done

  • log1FileTreeNode

  • copy_1to2 – (bool) only copy files from p1 to p2

  • create_dest – (bool) create destination directory if not exist

  • fLOG – logging function

Returns:

list of operations done by the function, list of 3-uple: action, source_file, dest_file

if file_date is mentioned, the second folder is not explored. Only the modified files will be taken into account (except for the first sync).

synchronize two folders

The following function synchronizes a folder with another one on a USB drive or a network drive. To minimize the number of access to the other location, it stores the status of the previous synchronization in a file (status_copy.txt in the below example). Next time, the function goes through the directory and sub-directories to synchronize and only propagates the modifications which happened since the last modification. The function filter_copy defines what file to synchronize or not.

def filter_copy(file):
    return "_don_t_synchronize_" not in file

synchronize_folder( "c:/mydata",
                    "g:/mybackup",
                    hash_size = 0,
                    filter_copy = filter_copy,
                    file_date = "c:/status_copy.txt")

The function is able to go through 90.000 files and 90 Gb in 12 minutes (for an update).

source on GitHub

pyquickhelper.filehelper.synchelper.walk(top, onerror=None, followlinks=False, neg_filter=None)[source]

Does the same as os.walk plus does not go through a sub-folder if this one is big. Folders such build or Debug or Release may not need to be dug into.

Parameters:
  • top – folder

  • onerror – see os.walk

  • followlinks – see os.walk

  • neg_filter – filtering, a string, every folder verifying the filter will be excluded (file pattern, not a regular expression pattern)

Returns:

see os.walk

source on GitHub