quantlaw.utils package¶
Submodules¶
quantlaw.utils.beautiful_soup module¶
-
quantlaw.utils.beautiful_soup.create_soup(path)[source]¶ Reads a file and returns a lxml-xml BeautifulSoup object.
quantlaw.utils.files module¶
quantlaw.utils.networkx module¶
-
quantlaw.utils.networkx.aggregate_attr_in_quotient_graph(nG, G, new_nodes, aggregation_attrs)[source]¶ Sums attributes of nodes in an original graph per community and adds the sum to the nodes in a quotient graph.
Parameters: - nG – Quotient graph
- G – Original graph new_nodes: Mapping of nodes in the quotient graph to an iterable of nodes in the original graph that are represented by the node in the quotient graph.
- aggregation_attrs – attributes to aggregate
-
quantlaw.utils.networkx.decay_function(key: int)[source]¶ Returns a decay function to create a weighted sequence graph.
-
quantlaw.utils.networkx.get_leaves(G: networkx.classes.digraph.DiGraph)[source]¶ Parameters: G – A tree as directed graph with edges from root to leaves Returns: Set of leaves of the tree G
-
quantlaw.utils.networkx.get_new_edges(G, ordered_seqitems, seq_decay_func)[source]¶ Convenience function to avoid list comprehension over four lines.
-
quantlaw.utils.networkx.hierarchy_graph(G: networkx.classes.digraph.DiGraph, ignore_attrs=False)[source]¶ Remove reference edges from G. Wrapper around induced_subgraph.
-
quantlaw.utils.networkx.induced_subgraph(G, filter_type, filter_attribute, filter_values, ignore_attrs=False)[source]¶ Create custom induced subgraph.
Parameters: - filter_type – ‘node’ or ‘edge’
- filter_attribute – attribute to filter on
- filter_values – attribute values to evaluate to True
-
quantlaw.utils.networkx.load_graph_from_csv_files(crossreference_folder, file_basename, filter='exclude_subseqitems', filter_by_edge_types=None)[source]¶ Loads a networkx MultiDiGraph from a nodelist and edgelist formatted as .csv.gz files. The node csv must have a ‘key’ column that serves as a node key. Other columns are added as node attributes. The edge csv must have a the columns ‘u’, ‘v’, ‘edge_type’. If filter is node all nodes will be loaded. By default subeqitems will be excluded. If filter is a callable, it is called with a pandas.DataFrame loaded from the csv as the only argument. The callable must return values to filter the DataFrame.
Parameters: - crossreference_folder – Folder containing the edgelists
- file_basename – base filename of the edgelists (will be suffixed with ‘.nodes.csv.gz’ and ‘.edges.csv.gz’)
- filter – Filters the nodes to load. Options “exclude_subseqitems”, None or a function that filters a pandas.DataFrame
- filter_by_edge_types – Filters the edges to load. None in includes all edges. You can also provide a list of edge_types. E.g. [‘containment’, ‘reference’].
-
quantlaw.utils.networkx.multi_to_weighted(G: networkx.classes.multidigraph.MultiDiGraph)[source]¶ Converts a multidigraph into a weighted digraph.
-
quantlaw.utils.networkx.quotient_graph(G, node_attribute, edge_types=['reference', 'cooccurrence'], self_loops=False, root_level=-1, aggregation_attrs=('chars_n', 'chars_nowhites', 'tokens_n', 'tokens_unique'))[source]¶ Generate the quotient graph with all nodes sharing the same node_attribute condensed into a single node. Simplest use case is aggregation by law_name.
-
quantlaw.utils.networkx.sequence_graph(G: networkx.classes.multidigraph.MultiDiGraph, seq_decay_func=<function decay_function.<locals>.<lambda>>, seq_ref_ratio=1)[source]¶ Creates sequence graph for G, consisting of seqitems and their cross-references only, where neighboring seqitems are connected via edges in both directions.
Parameters: - seq_decay_func – function to calculate sequence edge weight based on distance between neighboring nodes
- seq_ref_ratio – ratio between a sequence edge weight when nodes in the sequence are at minimum distance from each other and a reference edge weight