module filehelper.synchelper

Short summary

module pyquickhelper.filehelper.synchelper

Series of functions related to folder, explore, synchronize, remove (recursively).

source on GitHub

Functions

function

truncated documentation

explore_folder

Returns the list of files included in a folder and its subfolders. Returned names can be modified if sub_pattern is …

explore_folder_iterfile

Same as explore_folder() but iterates on files included in a folder and its subfolders.

explore_folder_iterfile_repo

Returns all files present in folder and added to a SVN or GIT repository.

has_been_updated

It assumes dest is a copy of source, it wants to know if the copy is up to date or not.

remove_folder

Removes everything in folder top.

synchronize_folder

Synchronizes two folders (or copy if the second is empty), it only copies more recent files. It can walk through …

walk

Does the same as os.walk plus does not go through a sub-folder if this one is big. Folders such build

Documentation

Series of functions related to folder, explore, synchronize, remove (recursively).

source on GitHub

pyquickhelper.filehelper.synchelper.explore_folder(folder, pattern=None, neg_pattern=None, fullname=False, return_only=None, recursive=True, sub_pattern=None, sub_replace=None, fLOG=None)[source][source]

Returns the list of files included in a folder and its subfolders. Returned names can be modified if sub_pattern is specified.

Parameters
  • folder – (str) folder

  • pattern – (str) if None, get all files, otherwise, it is a regular expression, the filename must verify (with the folder if fullname is True)

  • neg_pattern – (str) negative pattern

  • fullname – (bool) if True, include the subfolder while checking the regex (pattern)

  • return_only – (str) to return folders and files (=None), only the files (=’f’) or only the folders (*=’d’)

  • recursive – (bool) look into subfolders

  • sub_pattern – (str) replacements pattern, the output is then modified accordingly to this regular expression

  • sub_replace – (str) if sub_pattern is specified, this second pattern specifies how to replace

  • fLOG – (fct) logging function

Returns

(list, list), a list of folders, a list of files (the folder is not included the path name)

Explore the content of a directory

The command calls function explore_folder and makes the list of all files in a directory or all folders. Example:

python -m pyquickhelper ls -f _mynotebooks -r f -p .*[.]ipynb -n checkpoints -fu 1

It works better with chrome. An example to change file names:

python -m pyquickhelper ls -f myfolder -p .*[.]py -r f -n pycache -fu 1 -s test_(.*) -su unit_\1

Or another to automatically create git commands to rename files:

python -m pyquickhelper ls -f _mynotebooks -r f -p .*[.]ipynb -s "(.*)[.]ipynb" -su "git mv \1.ipynb \1~.ipynb"

<<<

python -m pyquickhelper ls --help

>>>

--SCRIPT---m pyquickhelper ls --help
--OUT--
usage: ls [-h] [-f FOLDER] [-p PATTERN] [-n NEG_PATTERN] [-fu FULLNAME]
          [-r RETURN_ONLY] [-re RECURSIVE] [-s SUB_PATTERN] [-su SUB_REPLACE]

Returns the list of files included in a folder and its subfolders. Returned
names can be modified if *sub_pattern* is specified.

optional arguments:
  -h, --help            show this help message and exit
  -f FOLDER, --folder FOLDER
                        (str) folder (default: None)
  -p PATTERN, --pattern PATTERN
                        (str) if None, get all files, otherwise, it is a
                        regular expression, the filename must verify (with the
                        folder if fullname is True) (default: )
  -n NEG_PATTERN, --neg_pattern NEG_PATTERN
                        (str) negative pattern (default: )
  -fu FULLNAME, --fullname FULLNAME
                        (bool) if True, include the subfolder while checking
                        the regex (pattern) (default: False)
  -r RETURN_ONLY, --return_only RETURN_ONLY
                        (str) to return folders and files (=None), only the
                        files (='f') or only the folders (*='d') <string>:10:
                        (WARNING/2) Inline emphasis start-string without end-
                        string. (default: )
  -re RECURSIVE, --recursive RECURSIVE
                        (bool) look into subfolders (default: True)
  -s SUB_PATTERN, --sub_pattern SUB_PATTERN
                        (str) replacements pattern, the output is then
                        modified accordingly to this regular expression
                        (default: )
  -su SUB_REPLACE, --sub_replace SUB_REPLACE
                        (str) if sub_pattern is specified, this second pattern
                        specifies how to replace (default: )


--ERR--
directive 'automodule' is already registered, it will be overridden
directive 'autoclass' is already registered, it will be overridden
directive 'autoexception' is already registered, it will be overridden
directive 'autodata' is already registered, it will be overridden
directive 'autofunction' is already registered, it will be overridden
directive 'autodecorator' is already registered, it will be overridden
directive 'automethod' is already registered, it will be overridden
directive 'autoattribute' is already registered, it will be overridden
directive 'autoproperty' is already registered, it will be overridden
directive 'autoinstanceattribute' is already registered, it will be overridden
directive 'autoslotsattribute' is already registered, it will be overridden
--PATH--
None

Changed in version 1.7: Parameter fLOG was added.

Changed in version 1.8: Parameters return_only, recursive were added.

source on GitHub

pyquickhelper.filehelper.synchelper.explore_folder_iterfile(folder, pattern=None, neg_pattern=None, fullname=False, recursive=True)[source][source]

Same as explore_folder but iterates on files included in a folder and its subfolders.

Parameters
  • folder – folder

  • pattern – if None, get all files, otherwise, it is a regular expression, the filename must verify (with the folder is fullname is True)

  • neg_pattern – negative pattern to exclude files

  • fullname – if True, include the subfolder while checking the regex

  • recursive – look into subfolders

Returns

iterator on files

Changed in version 1.7: Parameter recursive was added.

source on GitHub

pyquickhelper.filehelper.synchelper.explore_folder_iterfile_repo(folder, log=<function fLOG>)[source][source]

Returns all files present in folder and added to a SVN or GIT repository.

Parameters
  • folder – folder

  • log – log function

Returns

iterator

source on GitHub

pyquickhelper.filehelper.synchelper.has_been_updated(source, dest)[source][source]

It assumes dest is a copy of source, it wants to know if the copy is up to date or not.

Parameters
  • source – filename

  • dest – copy

Returns

True,reason or False,None

source on GitHub

pyquickhelper.filehelper.synchelper.remove_folder(top, remove_also_top=True, raise_exception=True)[source][source]

Removes everything in folder top.

Parameters
  • top – path to remove

  • remove_also_top – remove also root

  • raise_exception – raise an exception if a file cannot be remove

Returns

list of removed files and folders –> list of tuple ( (name, “file” or “dir”) )

source on GitHub

pyquickhelper.filehelper.synchelper.synchronize_folder(p1: str, p2: str, hash_size=1048576, repo1=False, repo2=False, size_different=True, no_deletion=False, filter: [<class 'str'>, typing.Callable[[str], str], None] = None, filter_copy: [<class 'str'>, typing.Callable[[str], str], None] = None, avoid_copy=False, operations=None, file_date: str = None, log1=False, copy_1to2=False, create_dest=False, fLOG=<function fLOG>)[source][source]

Synchronizes two folders (or copy if the second is empty), it only copies more recent files. It can walk through a git repository or SVN.

Parameters
  • p1 – (str) first path

  • p2 – (str) second path

  • hash_size – (bool) to check whether or not two files are different

  • repo1 – (bool) assuming the first folder is under SVN or GIT, it uses pysvn to get the list of files (avoiding any extra files)

  • repo2 – (bool) assuming the second folder is under SVN or GIT, it uses pysvn to get the list of files (avoiding any extra files)

  • size_different – (bool) if True, a file will be copied only if size are different, otherwise, it will be copied if the first file is more recent

  • no_deletion – (bool) if a file is found in the second folder and not in the first one, if will be removed unless no_deletion is True

  • filter – (str) None to accept every file, a string if it is a regular expression, a function for something more complex: function (fullname) --> True (every file is considered in lower case), (use re.search)

  • filter_copy – (str) None to accept every file, a string if it is a regular expression, a function for something more complex: function (fullname) –> True

  • avoid_copy – (bool) if True, just return the list of files which should be copied but does not do the copy

  • operations – if None, this function is called the following way operations(op, n1, n2) if should return True if the file was updated

  • file_date – (str) filename which contains information about when the last sync was done

  • log1FileTreeNode

  • copy_1to2 – (bool) only copy files from p1 to p2

  • create_dest – (bool) create destination directory if not exist

  • fLOG – logging function

Returns

list of operations done by the function, list of 3-uple: action, source_file, dest_file

if file_date is mentioned, the second folder is not explored. Only the modified files will be taken into account (except for the first sync).

synchronize two folders

The following function synchronizes a folder with another one on a USB drive or a network drive. To minimize the number of access to the other location, it stores the status of the previous synchronization in a file (status_copy.txt in the below example). Next time, the function goes through the directory and sub-directories to synchronize and only propagates the modifications which happened since the last modification. The function filter_copy defines what file to synchronize or not.

def filter_copy(file):
    return "_don_t_synchronize_" not in file

synchronize_folder( "c:/mydata",
                    "g:/mybackup",
                    hash_size = 0,
                    filter_copy = filter_copy,
                    file_date = "c:/status_copy.txt")

The function is able to go through 90.000 files and 90 Gb in 12 minutes (for an update).

Changed in version 1.7: Parameter create_dest was added.

source on GitHub

pyquickhelper.filehelper.synchelper.walk(top, onerror=None, followlinks=False, neg_filter=None)[source][source]

Does the same as os.walk plus does not go through a sub-folder if this one is big. Folders such build or Debug or Release may not need to be dug into.

Parameters
  • top – folder

  • onerror – see os.walk

  • followlinks – see os.walk

  • neg_filter – filtering, a string, every folder verifying the filter will be excluded (file pattern, not a regular expression pattern)

Returns

see os.walk

source on GitHub