module xmlhelper.html_parser_json

Inheritance diagram of pyrsslocal.xmlhelper.html_parser_json

Short summary

module pyrsslocal.xmlhelper.html_parser_json

parsing HTML to convert it into JSON

source on GitHub

Classes

class

truncated documentation

HTMLtoJSONParser

Parses HTML and output a JSON structure. Example:

Functions

function

truncated documentation

iterate_on_json

Iterates on every field contains in the JSON structure.

Properties

property

truncated documentation

json

Returns the JSON strucure.

Static Methods

staticmethod

truncated documentation

iterate

Iterates on every field contains in the JSON structure.

to_json

Converts HTML into JSON.

Methods

method

truncated documentation

__init__

clean

Cleans a dictionary of value.

handle_data

What to do with data.

handle_endtag

What to do for the end of a tag.

handle_starttag

What to do for a new tag.

Documentation

parsing HTML to convert it into JSON

source on GitHub

class pyrsslocal.xmlhelper.html_parser_json.HTMLtoJSONParser(raise_exception=True)

Bases: HTMLParser

Parses HTML and output a JSON structure. Example:

file = ...
with open(file,"r",encoding="utf8") as f : content = f.read()
parser = HTMLtoJSONParser()
parser.feed(content)
js = parser.json

Or:

js = HTMLtoJSONParser.to_json(content)

To iterator on path:

all = [ (k,v) for k,v in HTMLtoJSONParser.iterate(js) ]

source on GitHub

Parameters:

raise_exception – if True, raises an exception if the HTML is malformed, otherwise does what it can

source on GitHub

__init__(raise_exception=True)
Parameters:

raise_exception – if True, raises an exception if the HTML is malformed, otherwise does what it can

source on GitHub

clean(values)

Cleans a dictionary of value.

source on GitHub

handle_data(data)

What to do with data.

source on GitHub

handle_endtag(tag)

What to do for the end of a tag.

source on GitHub

handle_starttag(tag, attrs)

What to do for a new tag.

source on GitHub

static iterate(json_structure, prefix='', keep_dictionaries=False, skip=['__parent__'])

Iterates on every field contains in the JSON structure.

Parameters:
  • json_structure – json structure

  • prefix – prefix to add

  • keep_dictionaries – if True, add yield k,v where v is a JSON dictionary

  • skip – do not enter the following tag

Returns:

iterator of (path, value)

source on GitHub

property json

Returns the JSON strucure.

Returns:

json

source on GitHub

static to_json(content, raise_exception=True)

Converts HTML into JSON.

Parameters:
  • contentHTML content to parse

  • raise_exception – if True, raises an exception if the HTML is malformed, otherwise does what it can

source on GitHub

pyrsslocal.xmlhelper.html_parser_json.iterate_on_json(json_structure, prefix='', keep_dictionaries=False, skip=['__parent__'])

Iterates on every field contains in the JSON structure.

Parameters:
  • json_structure – json structure

  • prefix – prefix to add

  • keep_dictionaries – if True, add yield k,v where v is a JSON dictionary

  • skip – do not enter the following tag

Returns:

iterator of (path, value)

source on GitHub