quantlaw.de_extract package

Submodules

quantlaw.de_extract.load_statute_names module

quantlaw.de_extract.load_statute_names.load_law_names(date, path)[source]

quantlaw.de_extract.statutes_abstract module

class quantlaw.de_extract.statutes_abstract.StatusMatch(text: str, start: int, end: int)[source]

Bases: object

Base class to report the areas of citations to German statutes and regulations (also if a trigger e.g. ‘§’ is found but it is not followed by a citation).

has_main_area()[source]

Returns True if the match has a main area and thus its content can be parsed by StatutesParser

class quantlaw.de_extract.statutes_abstract.StatutesMatchWithMainArea(suffix_len: int, law_len: int, law_match_type: str, *args, **kwargs)[source]

Bases: quantlaw.de_extract.statutes_abstract.StatusMatch

Class to report the areas of citations to German statutes and regulations where a main area is found after the trigger “§ 123” where “123” is the main area.

has_main_area()[source]

Returns True if the match has a main area and thus its content can be parsed by StatutesParser

law_text()[source]

Returns: The referenced law.

main_text()[source]

Returns: The main area of a citation, which is the text part that specifies the cited part of the statute but omitting the name of the law. E.g. “§ 123 Abs. 4, Nr 5 und 6”

suffix_text()[source]

Returns: The text joins the main part of the citation with the name of the cited law. An empty sting is return if no law is specified. This is typically the case for references within a law.

class quantlaw.de_extract.statutes_abstract.StatutesProcessor(laws_lookup: dict)[source]

Bases: object

Abstract class to extract and parse statute references. The abstract class provides the names of laws they are cited with.

laws_lookup

A dictionary to find of the law names to extract. Keys are names of laws that are used in the source text used to cite laws. Values are unique identifiers of laws. For optimal results is is recommended to make the list a exhaustive as possible to reduce the chance that references are false treated as internal references within a law because the name of the referenced law is not recognized. The names of the laws should be provided in a stemmed format using the stemmer provided in quantlaw.de_extract.stemming.stem_law_name.

match_law_name(text: str)[source]

Checks if the text begins with a law name provided in self.laws_lookup_keys.

Returns: The matched substring.

quantlaw.de_extract.statutes_areas module

class quantlaw.de_extract.statutes_areas.StatutesExtractor(laws_lookup: dict)[source]

Bases: quantlaw.de_extract.statutes_abstract.StatutesProcessor

Class to find areas of citations to German statutes and regulations

find_all(text: str, pos: int = 0)[source]

Like search but returns a generator of all matches found in text

get_dict_law_name_len(test_str)[source]

Determines if the test_str starts with a law name given with self.laws_lookup.

Returns: The length matched law name or 0.

static get_eu_law_name_len(test_str) → int[source]
Returns: The length of the law name of european legislation in chars or
0 if no law name of this type was found
static get_ignore_law_name_len(test_str)[source]
Returns: Th length of a law name to ignore in chars or 0 if no law name of
this type was found
static get_no_suffix_ignore_law_name_len(test_str) → int[source]
Returns: Length of the law name in chars, if no suffix is present that connects
the main area with the law name or 0 if no law name of this type was found
static get_sgb_law_name_len(test_str) → int[source]
Returns: The length of the SGB law name in chars or 0 if no law name of this
type was found
get_suffix_and_law_name(string: str)[source]

Returns: A tuple containing length of

  1. the article between numbers and law name (eg. ” der “)
  2. length of name of law as in the given string
  3. The type of the reference.

If not found lengths are 0.

search(text: str, pos: int = 0) → quantlaw.de_extract.statutes_abstract.StatusMatch[source]

Finds the next occurrence of a statute reference in a given text

Parameters:
  • text – The text to search in.
  • pos – Position to start searching.

Returns: The match or None if no references are found.

quantlaw.de_extract.statutes_areas_patterns module

quantlaw.de_extract.statutes_parse module

exception quantlaw.de_extract.statutes_parse.NoUnitMatched[source]

Bases: Exception

Exception is raised if a unit in a refren cannot be parsed.

class quantlaw.de_extract.statutes_parse.StatutesParser(laws_lookup: dict)[source]

Bases: quantlaw.de_extract.statutes_abstract.StatutesProcessor

Class to parse the content of a reference area identified by StatutesExtractor

static fix_errors_in_citation(citation)[source]

Fix some common inconsistencies in the references such as double spaces.

static infer_units(reference_path, prev_reference_path)[source]

In some cases of an enumeration a numeric value is not directed prefixed by the corresponding unit. E.g. “§ 123 Abs. 1 S. 2, 3 S. 4”. In this case “3” is not prefixed with its unit. Instead it can be inferred by looking at the whole citation that it is next higher unit of “S.”, hence “Abs.”. These inferred units are added to parsed data.

static is_numb(token: str)[source]

Returns: True if the token is a ‘numeric’ value of the reference.

static is_pre_numb(token: str)[source]

Returns: True if the token is a number that comes before the unit. E.g. ‘erster Halbsatz’

static is_unit(token: str)[source]

Returns: True if the token is a unit

parse_law(law_text: str, match_type: str, current_lawid: str = None)[source]

Parses the law information from a references found by StatutesMatchWithMainArea

Parameters:
  • main_text – E.g. “§ 123 Abs. 4 und 5 Nr. 6”
  • law_text – E.g. “BGB”
  • match_type – E.g. “dict”

Returns: The key of a parse law.

parse_main(main_text: str) → list[source]

Parses a string containing a reference to a specific section within a given law. E.g. “§ 123 Abs. 4 Satz 5 und 6”. The parsed informtaion is formatted into lists nested in lists nested in lists.

The outer list is a list of references.

References are lists of path components. A path component is e.g. “Abs. 4”.

A path component is represented by a list with two elements: The first contains the unit the second the value.

The example above would be represented as [[[’§’, ‘123’], [‘Abs’, ‘4’], [‘Satz’, ‘5’]], [[’§’, ‘123’], [‘Abs’, ‘4’], [‘Satz’, ‘6’]]].

Parameters:main_text – string to parse

Returns: The parsed reference.

static split_citation_into_enum_parts(citation)[source]

A citation can contain references to multiple parts of the law. E.g. ‘§§ 20 und 35’ or ‘Art. 3 Abs. 1 Satz 1, Abs. 3 Satz 1’. The citation is split into parts so that each referenced section of the law is separated. E.g. ‘§§ 20’ and ‘35’ resp. ‘Art. 3 Abs. 1 Satz 1’ and ‘Abs. 3 Satz 1’. However, ranges are not spit: E.g. “§§ 1 bis 10” will not be split.

static split_citation_part(string: str)[source]

A string a tokenizes. Tokens are identified as units or values. Pairs are built to connect the units with their respective values. If the unit cannot be indentified (and must be inferred later) None is returned.

Parameters:string – A string that is part of a reference and cites one part a statute.
Retruns: As a generator tuples are returned, each containing the unit (or None)
and the respecive value.
static split_parts_accidently_joined(reference_paths)[source]

Reformats the parsed references to separate accitently joined references. E.g. the original referehence “§ 123 § 126” will not be split by split_citation_into_enum_parts because the separation is falsly not indicated by a ‘,’, ‘or’ etc. It come from the unit ‘§’ that it can be inferred that the citation contains references to two parts of statutes. This function accounts for the case that the unit ‘§’ or ‘Art’ appears twice in the same reference path and split the path into several elements.

static stem_unit(unit: str)[source]

Brings a unit into a standard format. E.g. removes abbreviations, grammatical differences spelling errors, etc.

Parameters:unit – A string containing a unit that should be converted into a standard format.
Returns: Unit in a standard format as string. E.g. §, Art, Nr, Halbsatz,
Anhang, …
exception quantlaw.de_extract.statutes_parse.StringCaseException[source]

Bases: Exception

Exception is raised if a unit in a reference cannot be parsed. In this case it is often an issue of upper oder lower case formatting.

quantlaw.de_extract.statutes_parse_patterns module

quantlaw.de_extract.statutes_parse_patterns.generate_sgb_dict()[source]

Returns a dictionary, Its keys are different ways how SGB books are cited. They are mapped to values that represent the keys to the SGB books.

quantlaw.de_extract.stemming module

quantlaw.de_extract.stemming.clean_name(name: str) → str[source]

Bring the name into a standard format by replacing multiple spaces and characters specific for German language

quantlaw.de_extract.stemming.stem_law_name(name)[source]

Stems name of laws to prepare for recognizing laws in the code

Module contents