pqp.data package¶
Submodules¶
pqp.data.data module¶
- class pqp.data.data.Data(df, vars=None, validate_domain=True, **kwargs)¶
Bases:
Result
Class representing datasets, wraps a pandas DataFrame and contains some metadata
The main job of this class is to validate/create a set of symbolic variables that align with the columns names on
df
.Examples:
>>> df = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]}) >>> # it can infer a discrete domain for the variables >>> Data(df, vars=[Variable("x"), Variable("y")]) >>> Data(df, vars={"x": Variable("X"), "y": Variable("Y")})
>>> # or you can specify the domain explicitly >>> Data(df, vars={"x": DiscreteDomain([1, 2, 3, 4, 6]), "y": ContinuousDomain([1, 10])}) >>> Data(df, vars={"x": "discrete", "y": "continuous"}) >>> Data(df, vars={"x": None, "y": None}) # specifies a domain of unknown type
>>> # examples of invalid calls >>> Data(df, vars={"x": Variable("x"), "z": Variable("z")}) # z not in df >>> Data(df, vars={"x": Variable("x"), "y": "some name"}) # "some name" is interpreted as a domain type
- Parameters:
df (
pandas.DataFrame
) – the datasetvars (
list
ordict
) – the variables in the dataset, either a list ofVariable
(names must match columns indf
) or adict
mapping column names indf
toVariable
orstr
specifying the type of the variable’s domainvalidate_domain (
bool
) – ifFalse
, do not validate that the data conforms to the domains specified, defaults toTrue
silence_inferred_domain_warning (
bool
) – ifTrue
, silences the default warning when the domain of a variable is inferredsilence_unit_domain_warning (
bool
) – ifTrue
, silences the default warning when an inferred domain for a variable has only a single possible value
- df¶
The dataset
- Type:
pandas.DataFrame
- vars¶
dict
mapping variable names toVariable
objects- Type:
dict
- domain_of(var)¶
Get the domain of a variable
- Parameters:
var (
str
orVariable
) – the variable whose domain to get- Returns:
the domain of the variable
- Return type:
Domain
- property n¶
The number of rows in the dataset
- quantize(var, n_bins=2)¶
Quantize a variable in place
Turns a continuous variable into a categorical variable by binning it into
n_bins
bins.- Parameters:
var (
str
orVariable
) – the variable to quantizen_bins (
int
) – the number of bins to quantize into
- Returns:
None
pqp.data.domain module¶
- class pqp.data.domain.BinaryDomain¶
Bases:
CategoricalDomain
Domain for a binary variable
- describe_assumptions()¶
Returns a human readable description of assumptions made about the domain
- class pqp.data.domain.CategoricalDomain(values)¶
Bases:
DiscreteDomain
Domain of a categorical variable
- describe_assumptions()¶
Returns a human readable description of assumptions made about the domain
- get_cardinality()¶
Calculates the cardinality of the domain
- Returns:
the number of possible values in the domain
- Return type:
int
- get_values()¶
Returns a list of all possible values in the domain
- Returns:
all possible values in the domain
- Return type:
list
- class pqp.data.domain.ContinuousDomain¶
Bases:
Domain
Abstract base class for continuous domains
- get_cardinality()¶
The number of possible values a variable can take on
- class pqp.data.domain.DiscreteDomain¶
Bases:
Domain
,ABC
Abstract base class for discrete domains
- abstract get_cardinality()¶
Calculates the cardinality of the domain
- Returns:
the number of possible values in the domain
- Return type:
int
- abstract get_values()¶
Returns a list of all possible values in the domain
- Returns:
all possible values in the domain
- Return type:
list
- class pqp.data.domain.Domain¶
Bases:
ABC
Manages the possible values that a Variable can take on
- abstract describe_assumptions()¶
Returns a human readable description of assumptions made about the domain
- abstract get_cardinality()¶
The number of possible values a variable can take on
- validate(values)¶
Validates that a list of values is in the domain
- Parameters:
values (
list
) – a list of values to validate- Returns:
True
if all values are in the domain,False
otherwise- Return type:
bool
- validate_or_throw(values)¶
Validates that a list of values is in the domain, throws an error if not
- Parameters:
values (
list
) – a list of values to validate- Raises:
ValueError – if any value is not in the domain
- class pqp.data.domain.IntegerDomain(values)¶
Bases:
DiscreteDomain
- check_int(val)¶
Checks whether a value is an integer
- describe_assumptions()¶
Returns a human readable description of assumptions made about the domain
- get_cardinality()¶
Calculates the cardinality of the domain
- Returns:
the number of possible values in the domain
- Return type:
int
- get_values()¶
Returns a list of all possible values in the domain
- Returns:
all possible values in the domain
- Return type:
list
- class pqp.data.domain.RealDomain(values)¶
Bases:
ContinuousDomain
A continuous domain for a variable, delimited by min and max values
- Parameters:
values (
list
) – the min and max of this list are taken as the min and max of the domain
- describe_assumptions()¶
Returns a human readable description of assumptions made about the domain
- pqp.data.domain.infer_domain_type(vals)¶
Attempts to infer the best domain type to use of a list of values
The heuristics are somewhat complicated. You should run this and inspect the result before use.
- Parameters:
vals (
Iterable
) – alist
of values to infer the domain type of
- pqp.data.domain.make_domain(domain_type, values=None)¶
Generates a domain object from a string and optional values
- Options:
"discrete"
: values must be specified"continuous"
: min and max ofvalues
used to specify domain"binary"
: values are ignored"integer"
: min and max ofvalues
used to specify domain"infer"
: attempt to guess
- Parameters:
domain_type (
str
) – one of"discrete"
,"continuous"
,"binary"
,"integer"
, or"infer"
values (
list
) – the values of the domain