quantlaw.utils package¶
Submodules¶
quantlaw.utils.beautiful_soup module¶
-
quantlaw.utils.beautiful_soup.
create_soup
(path)[source]¶ Reads a file and returns a lxml-xml BeautifulSoup object.
quantlaw.utils.files module¶
quantlaw.utils.networkx module¶
-
quantlaw.utils.networkx.
aggregate_attr_in_quotient_graph
(nG, G, new_nodes, aggregation_attrs)[source]¶ Sums attributes of nodes in an original graph per community and adds the sum to the nodes in a quotient graph.
Parameters: - nG – Quotient graph
- G – Original graph new_nodes: Mapping of nodes in the quotient graph to an iterable of nodes in the original graph that are represented by the node in the quotient graph.
- aggregation_attrs – attributes to aggregate
-
quantlaw.utils.networkx.
decay_function
(key: int)[source]¶ Returns a decay function to create a weighted sequence graph.
-
quantlaw.utils.networkx.
get_leaves
(G: networkx.classes.digraph.DiGraph)[source]¶ Parameters: G – A tree as directed graph with edges from root to leaves Returns: Set of leaves of the tree G
-
quantlaw.utils.networkx.
get_new_edges
(G, ordered_seqitems, seq_decay_func)[source]¶ Convenience function to avoid list comprehension over four lines.
-
quantlaw.utils.networkx.
hierarchy_graph
(G: networkx.classes.digraph.DiGraph, ignore_attrs=False)[source]¶ Remove reference edges from G. Wrapper around induced_subgraph.
-
quantlaw.utils.networkx.
induced_subgraph
(G, filter_type, filter_attribute, filter_values, ignore_attrs=False)[source]¶ Create custom induced subgraph.
Parameters: - filter_type – ‘node’ or ‘edge’
- filter_attribute – attribute to filter on
- filter_values – attribute values to evaluate to True
-
quantlaw.utils.networkx.
load_graph_from_csv_files
(crossreference_folder, file_basename, filter='exclude_subseqitems', filter_by_edge_types=None)[source]¶ Loads a networkx MultiDiGraph from a nodelist and edgelist formatted as .csv.gz files. The node csv must have a ‘key’ column that serves as a node key. Other columns are added as node attributes. The edge csv must have a the columns ‘u’, ‘v’, ‘edge_type’. If filter is node all nodes will be loaded. By default subeqitems will be excluded. If filter is a callable, it is called with a pandas.DataFrame loaded from the csv as the only argument. The callable must return values to filter the DataFrame.
Parameters: - crossreference_folder – Folder containing the edgelists
- file_basename – base filename of the edgelists (will be suffixed with ‘.nodes.csv.gz’ and ‘.edges.csv.gz’)
- filter – Filters the nodes to load. Options “exclude_subseqitems”, None or a function that filters a pandas.DataFrame
- filter_by_edge_types – Filters the edges to load. None in includes all edges. You can also provide a list of edge_types. E.g. [‘containment’, ‘reference’].
-
quantlaw.utils.networkx.
multi_to_weighted
(G: networkx.classes.multidigraph.MultiDiGraph)[source]¶ Converts a multidigraph into a weighted digraph.
-
quantlaw.utils.networkx.
quotient_graph
(G, node_attribute, edge_types=['reference', 'cooccurrence'], self_loops=False, root_level=-1, aggregation_attrs=('chars_n', 'chars_nowhites', 'tokens_n', 'tokens_unique'))[source]¶ Generate the quotient graph with all nodes sharing the same node_attribute condensed into a single node. Simplest use case is aggregation by law_name.
-
quantlaw.utils.networkx.
sequence_graph
(G: networkx.classes.multidigraph.MultiDiGraph, seq_decay_func=<function decay_function.<locals>.<lambda>>, seq_ref_ratio=1)[source]¶ Creates sequence graph for G, consisting of seqitems and their cross-references only, where neighboring seqitems are connected via edges in both directions.
Parameters: - seq_decay_func – function to calculate sequence edge weight based on distance between neighboring nodes
- seq_ref_ratio – ratio between a sequence edge weight when nodes in the sequence are at minimum distance from each other and a reference edge weight