quantlaw.utils package

Submodules

quantlaw.utils.beautiful_soup module

quantlaw.utils.beautiful_soup.create_soup(path)[source]

Reads a file and returns a lxml-xml BeautifulSoup object.

quantlaw.utils.beautiful_soup.find_parent_with_name(tag: str, name: str)[source]
Parameters:
  • tag – A tag of a BeautifulSoup
  • name – name to search in parents

Returns: the nearest ancestor with the name

quantlaw.utils.beautiful_soup.save_soup(soup: bs4.BeautifulSoup, path: str)[source]

Writes an BeautifulSoup object to a file at a given path.

quantlaw.utils.files module

quantlaw.utils.files.ensure_exists(path: str)[source]

Creates a folder if it does not exists yet. Returns: In any case the input path is returned

quantlaw.utils.files.list_dir(path: str, type: str)[source]

List files in a folder given by the path filtered by type.

quantlaw.utils.networkx module

quantlaw.utils.networkx.aggregate_attr_in_quotient_graph(nG, G, new_nodes, aggregation_attrs)[source]

Sums attributes of nodes in an original graph per community and adds the sum to the nodes in a quotient graph.

Parameters:
  • nG – Quotient graph
  • G – Original graph new_nodes: Mapping of nodes in the quotient graph to an iterable of nodes in the original graph that are represented by the node in the quotient graph.
  • aggregation_attrs – attributes to aggregate
quantlaw.utils.networkx.decay_function(key: int)[source]

Returns a decay function to create a weighted sequence graph.

quantlaw.utils.networkx.get_leaves(G: networkx.classes.digraph.DiGraph)[source]
Parameters:G – A tree as directed graph with edges from root to leaves

Returns: Set of leaves of the tree G

quantlaw.utils.networkx.get_new_edges(G, ordered_seqitems, seq_decay_func)[source]

Convenience function to avoid list comprehension over four lines.

quantlaw.utils.networkx.hierarchy_graph(G: networkx.classes.digraph.DiGraph, ignore_attrs=False)[source]

Remove reference edges from G. Wrapper around induced_subgraph.

quantlaw.utils.networkx.induced_subgraph(G, filter_type, filter_attribute, filter_values, ignore_attrs=False)[source]

Create custom induced subgraph.

Parameters:
  • filter_type – ‘node’ or ‘edge’
  • filter_attribute – attribute to filter on
  • filter_values – attribute values to evaluate to True
quantlaw.utils.networkx.load_graph_from_csv_files(crossreference_folder, file_basename, filter='exclude_subseqitems', filter_by_edge_types=None)[source]

Loads a networkx MultiDiGraph from a nodelist and edgelist formatted as .csv.gz files. The node csv must have a ‘key’ column that serves as a node key. Other columns are added as node attributes. The edge csv must have a the columns ‘u’, ‘v’, ‘edge_type’. If filter is node all nodes will be loaded. By default subeqitems will be excluded. If filter is a callable, it is called with a pandas.DataFrame loaded from the csv as the only argument. The callable must return values to filter the DataFrame.

Parameters:
  • crossreference_folder – Folder containing the edgelists
  • file_basename – base filename of the edgelists (will be suffixed with ‘.nodes.csv.gz’ and ‘.edges.csv.gz’)
  • filter – Filters the nodes to load. Options “exclude_subseqitems”, None or a function that filters a pandas.DataFrame
  • filter_by_edge_types – Filters the edges to load. None in includes all edges. You can also provide a list of edge_types. E.g. [‘containment’, ‘reference’].
quantlaw.utils.networkx.multi_to_weighted(G: networkx.classes.multidigraph.MultiDiGraph)[source]

Converts a multidigraph into a weighted digraph.

quantlaw.utils.networkx.quotient_graph(G, node_attribute, edge_types=['reference', 'cooccurrence'], self_loops=False, root_level=-1, aggregation_attrs=('chars_n', 'chars_nowhites', 'tokens_n', 'tokens_unique'))[source]

Generate the quotient graph with all nodes sharing the same node_attribute condensed into a single node. Simplest use case is aggregation by law_name.

quantlaw.utils.networkx.sequence_graph(G: networkx.classes.multidigraph.MultiDiGraph, seq_decay_func=<function decay_function.<locals>.<lambda>>, seq_ref_ratio=1)[source]

Creates sequence graph for G, consisting of seqitems and their cross-references only, where neighboring seqitems are connected via edges in both directions.

Parameters:
  • seq_decay_func – function to calculate sequence edge weight based on distance between neighboring nodes
  • seq_ref_ratio – ratio between a sequence edge weight when nodes in the sequence are at minimum distance from each other and a reference edge weight

quantlaw.utils.pipeline module

class quantlaw.utils.pipeline.PipelineStep(processes=None, execute_args=[])[source]

Bases: object

chunksize = None
execute_filtered_items(items, filters=None, *args, **kwargs)[source]
execute_item(item)[source]
execute_items(items)[source]
finish_execution(results)[source]
get_items() → list[source]
max_number_of_processes = 1

Module contents