module filehelper.content_helper
¶
Short summary¶
module pyensae.filehelper.content_helper
Various functions to process text
Functions¶
function |
truncated documentation |
---|---|
Extracts lines matching a regular expression. |
|
Returns the encoding of a file. The function relies on chardet. … |
|
Extracts the first nbline of a file (assuming it is text file). |
|
Extracts the first nbline of a file (assuming it is text file). |
|
Replaces all commas by point in a file (do that inplace). |
Documentation¶
Various functions to process text
- pyensae.filehelper.content_helper.enumerate_grep(filename, regex, encoding='utf8', errors=None)¶
Extracts lines matching a regular expression.
- Parameters:
filename – filename
regex – regular expression
encoding – encoding
errors – see open
- Returns:
iterator in lines
New in version 1.1.
- pyensae.filehelper.content_helper.file_encoding(filename_or_bytes, limit=1048576)¶
Returns the encoding of a file. The function relies on chardet.
- Parameters:
filename_or_bytes – filename or bytes
limit – if filename_or_bytes is a file, the function only loads the first limit bytes (or all if limit is -1)
- Returns:
dictionary
Example of results:
{'encoding': 'EUC-JP', 'confidence': 0.99}
- pyensae.filehelper.content_helper.file_head(filename: str, nbline=10, encoding='utf8', errors='strict')¶
Extracts the first nbline of a file (assuming it is text file).
- Parameters:
filename – filename
nbline – number of lines
encoding – encoding
errors –
see open
- Returns:
list of lines
- pyensae.filehelper.content_helper.file_tail(filename: str, nbline=10, encoding='utf8', threshold=16384, errors='strict')¶
Extracts the first nbline of a file (assuming it is text file).
- Parameters:
filename – filename
nbline – number of lines
encoding – encoding
threshold – if the file size is above, it will not read the beginning
errors –
see open
- Returns:
list of lines
The line marked as A has an issue because the cursor could fall on a character (= byte) in the middle of a character if the file is encoded in utf-8 character. The next line fails. That’s why we try again by moving the cursor by one character (see line B).
The first returned line may be incomplete.
- pyensae.filehelper.content_helper.replace_comma_by_point(file)¶
Replaces all commas by point in a file (do that inplace).
- Parameters:
file – file to process