Data¶
- class pqp.data.Data(df, vars=None, validate_domain=True, **kwargs)¶
Bases:
Result
Class representing datasets, wraps a pandas DataFrame and contains some metadata
The main job of this class is to validate/create a set of symbolic variables that align with the columns names on
df
.Examples:
>>> df = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]}) >>> # it can infer a discrete domain for the variables >>> Data(df, vars=[Variable("x"), Variable("y")]) >>> Data(df, vars={"x": Variable("X"), "y": Variable("Y")})
>>> # or you can specify the domain explicitly >>> Data(df, vars={"x": DiscreteDomain([1, 2, 3, 4, 6]), "y": ContinuousDomain([1, 10])}) >>> Data(df, vars={"x": "discrete", "y": "continuous"}) >>> Data(df, vars={"x": None, "y": None}) # specifies a domain of unknown type
>>> # examples of invalid calls >>> Data(df, vars={"x": Variable("x"), "z": Variable("z")}) # z not in df >>> Data(df, vars={"x": Variable("x"), "y": "some name"}) # "some name" is interpreted as a domain type
- Parameters:
df (
pandas.DataFrame
) – the datasetvars (
list
ordict
) – the variables in the dataset, either a list ofVariable
(names must match columns indf
) or adict
mapping column names indf
toVariable
orstr
specifying the type of the variable’s domainvalidate_domain (
bool
) – ifFalse
, do not validate that the data conforms to the domains specified, defaults toTrue
silence_inferred_domain_warning (
bool
) – ifTrue
, silences the default warning when the domain of a variable is inferredsilence_unit_domain_warning (
bool
) – ifTrue
, silences the default warning when an inferred domain for a variable has only a single possible value
- df¶
The dataset
- Type:
pandas.DataFrame
- vars¶
dict
mapping variable names toVariable
objects- Type:
dict
Attributes Summary
The number of rows in the dataset
Methods Summary
domain_of
(var)Get the domain of a variable
quantize
(var[, n_bins])Quantize a variable in place
Attributes Documentation
- n¶
The number of rows in the dataset
Methods Documentation
- domain_of(var)¶
Get the domain of a variable
- Parameters:
var (
str
orVariable
) – the variable whose domain to get- Returns:
the domain of the variable
- Return type:
Domain
- quantize(var, n_bins=2)¶
Quantize a variable in place
Turns a continuous variable into a categorical variable by binning it into
n_bins
bins.- Parameters:
var (
str
orVariable
) – the variable to quantizen_bins (
int
) – the number of bins to quantize into
- Returns:
None