pqp.data package¶
Submodules¶
pqp.data.data module¶
- class pqp.data.data.Data(df, vars=None, validate_domain=True, **kwargs)¶
Bases:
ResultClass representing datasets, wraps a pandas DataFrame and contains some metadata
The main job of this class is to validate/create a set of symbolic variables that align with the columns names on
df.Examples:
>>> df = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]}) >>> # it can infer a discrete domain for the variables >>> Data(df, vars=[Variable("x"), Variable("y")]) >>> Data(df, vars={"x": Variable("X"), "y": Variable("Y")})
>>> # or you can specify the domain explicitly >>> Data(df, vars={"x": DiscreteDomain([1, 2, 3, 4, 6]), "y": ContinuousDomain([1, 10])}) >>> Data(df, vars={"x": "discrete", "y": "continuous"}) >>> Data(df, vars={"x": None, "y": None}) # specifies a domain of unknown type
>>> # examples of invalid calls >>> Data(df, vars={"x": Variable("x"), "z": Variable("z")}) # z not in df >>> Data(df, vars={"x": Variable("x"), "y": "some name"}) # "some name" is interpreted as a domain type
- Parameters:
df (
pandas.DataFrame) – the datasetvars (
listordict) – the variables in the dataset, either a list ofVariable(names must match columns indf) or adictmapping column names indftoVariableorstrspecifying the type of the variable’s domainvalidate_domain (
bool) – ifFalse, do not validate that the data conforms to the domains specified, defaults toTruesilence_inferred_domain_warning (
bool) – ifTrue, silences the default warning when the domain of a variable is inferredsilence_unit_domain_warning (
bool) – ifTrue, silences the default warning when an inferred domain for a variable has only a single possible value
- df¶
The dataset
- Type:
pandas.DataFrame
- vars¶
dictmapping variable names toVariableobjects- Type:
dict
- domain_of(var)¶
Get the domain of a variable
- Parameters:
var (
strorVariable) – the variable whose domain to get- Returns:
the domain of the variable
- Return type:
Domain
- property n¶
The number of rows in the dataset
- quantize(var, n_bins=2)¶
Quantize a variable in place
Turns a continuous variable into a categorical variable by binning it into
n_binsbins.- Parameters:
var (
strorVariable) – the variable to quantizen_bins (
int) – the number of bins to quantize into
- Returns:
None
pqp.data.domain module¶
- class pqp.data.domain.BinaryDomain¶
Bases:
CategoricalDomainDomain for a binary variable
- describe_assumptions()¶
Returns a human readable description of assumptions made about the domain
- class pqp.data.domain.CategoricalDomain(values)¶
Bases:
DiscreteDomainDomain of a categorical variable
- describe_assumptions()¶
Returns a human readable description of assumptions made about the domain
- get_cardinality()¶
Calculates the cardinality of the domain
- Returns:
the number of possible values in the domain
- Return type:
int
- get_values()¶
Returns a list of all possible values in the domain
- Returns:
all possible values in the domain
- Return type:
list
- class pqp.data.domain.ContinuousDomain¶
Bases:
DomainAbstract base class for continuous domains
- get_cardinality()¶
The number of possible values a variable can take on
- class pqp.data.domain.DiscreteDomain¶
Bases:
Domain,ABCAbstract base class for discrete domains
- abstract get_cardinality()¶
Calculates the cardinality of the domain
- Returns:
the number of possible values in the domain
- Return type:
int
- abstract get_values()¶
Returns a list of all possible values in the domain
- Returns:
all possible values in the domain
- Return type:
list
- class pqp.data.domain.Domain¶
Bases:
ABCManages the possible values that a Variable can take on
- abstract describe_assumptions()¶
Returns a human readable description of assumptions made about the domain
- abstract get_cardinality()¶
The number of possible values a variable can take on
- validate(values)¶
Validates that a list of values is in the domain
- Parameters:
values (
list) – a list of values to validate- Returns:
Trueif all values are in the domain,Falseotherwise- Return type:
bool
- validate_or_throw(values)¶
Validates that a list of values is in the domain, throws an error if not
- Parameters:
values (
list) – a list of values to validate- Raises:
ValueError – if any value is not in the domain
- class pqp.data.domain.IntegerDomain(values)¶
Bases:
DiscreteDomain- check_int(val)¶
Checks whether a value is an integer
- describe_assumptions()¶
Returns a human readable description of assumptions made about the domain
- get_cardinality()¶
Calculates the cardinality of the domain
- Returns:
the number of possible values in the domain
- Return type:
int
- get_values()¶
Returns a list of all possible values in the domain
- Returns:
all possible values in the domain
- Return type:
list
- class pqp.data.domain.RealDomain(values)¶
Bases:
ContinuousDomainA continuous domain for a variable, delimited by min and max values
- Parameters:
values (
list) – the min and max of this list are taken as the min and max of the domain
- describe_assumptions()¶
Returns a human readable description of assumptions made about the domain
- pqp.data.domain.infer_domain_type(vals)¶
Attempts to infer the best domain type to use of a list of values
The heuristics are somewhat complicated. You should run this and inspect the result before use.
- Parameters:
vals (
Iterable) – alistof values to infer the domain type of
- pqp.data.domain.make_domain(domain_type, values=None)¶
Generates a domain object from a string and optional values
- Options:
"discrete": values must be specified"continuous": min and max ofvaluesused to specify domain"binary": values are ignored"integer": min and max ofvaluesused to specify domain"infer": attempt to guess
- Parameters:
domain_type (
str) – one of"discrete","continuous","binary","integer", or"infer"values (
list) – the values of the domain