module `filehelper.content_helper`¶

Short summary¶

module pyensae.filehelper.content_helper

Various functions to process text

Functions¶

function	truncated documentation
`enumerate_grep`	Extracts lines matching a regular expression.
`file_encoding`	Returns the encoding of a file. The function relies on chardet. …
`file_head`	Extracts the first nbline of a file (assuming it is text file).
`file_tail`	Extracts the first nbline of a file (assuming it is text file).
`replace_comma_by_point`	Replaces all commas by point in a file (do that inplace).

Documentation¶

Various functions to process text

source on GitHub

pyensae.filehelper.content_helper.enumerate_grep(filename, regex, encoding='utf8', errors=None)¶

Extracts lines matching a regular expression.

Parameters:

filename – filename
regex – regular expression
encoding – encoding
errors – see open

Returns:

iterator in lines

New in version 1.1.

source on GitHub

pyensae.filehelper.content_helper.file_encoding(filename_or_bytes, limit=1048576)¶

Returns the encoding of a file. The function relies on chardet.

Parameters:

filename_or_bytes – filename or bytes
limit – if filename_or_bytes is a file, the function only loads the first limit bytes (or all if limit is -1)

Returns:

dictionary

Example of results:

{'encoding': 'EUC-JP', 'confidence': 0.99}

source on GitHub

pyensae.filehelper.content_helper.file_head(filename: str, nbline=10, encoding='utf8', errors='strict')¶

Extracts the first nbline of a file (assuming it is text file).

Parameters:

filename – filename
nbline – number of lines
encoding – encoding
errors –
see open

Returns:

list of lines

source on GitHub

pyensae.filehelper.content_helper.file_tail(filename: str, nbline=10, encoding='utf8', threshold=16384, errors='strict')¶

Extracts the first nbline of a file (assuming it is text file).

Parameters:

filename – filename
nbline – number of lines
encoding – encoding
threshold – if the file size is above, it will not read the beginning
errors –
see open

Returns:

list of lines

The line marked as A has an issue because the cursor could fall on a character (= byte) in the middle of a character if the file is encoded in utf-8 character. The next line fails. That’s why we try again by moving the cursor by one character (see line B).

The first returned line may be incomplete.

source on GitHub

pyensae.filehelper.content_helper.replace_comma_by_point(file)¶

Replaces all commas by point in a file (do that inplace).

Parameters:: file – file to process

source on GitHub

module `filehelper.content_helper`¶

Short summary¶

Functions¶

Documentation¶

Links

Contents

Information

Related Topics

This Page

module filehelper.content_helper¶

Short summary¶

Functions¶

Documentation¶

module `filehelper.content_helper`¶