cgnal.core.data.model package
Submodules
cgnal.core.data.model.core module
Module with base abstraction of common objects.
- class cgnal.core.data.model.core.BaseIterable(*args, **kwds)
Bases:
Generic
[cgnal.core.data.model.core.T
]Abstract class defining interface for iterables.
- abstract property cached: bool
Whether the iterable is cached in memory or lazy.
- Returns
boolean indicating whether iterable is fully-stored in memory
- abstract classmethod empty() cgnal.core.data.model.core.BaseIterableType
Return an empty iterable instance.
- abstract property items: Iterable[cgnal.core.data.model.core.T]
Return an iterator over the items.
- Returns
Iterable[T]
- abstract property type: Type[cgnal.core.data.model.core.T]
Return the type of the objects in the Iterable.
- class cgnal.core.data.model.core.BaseRange
Bases:
abc.ABC
Abstract Range Class.
- property business_days: List[pandas._libs.tslibs.timestamps.Timestamp]
Create date range with daily frequency.
- Returns
list of pd.Timestamp from start to end with daily frequency including only days from Mon to Fri
- property days: List[pandas._libs.tslibs.timestamps.Timestamp]
Create date range with daily frequency.
- Returns
list of pd.Timestamp from start to end with daily frequency
- abstract property end: pandas._libs.tslibs.timestamps.Timestamp
Return the last timestamp.
- Returns
Timestamp
- property minutes_15: List[pandas._libs.tslibs.timestamps.Timestamp]
Create date range with daily frequency.
- Returns
list of pd.Timestamp from start to end with 15 minutes frequency
- abstract overlaps(other: cgnal.core.data.model.core.BaseRange) bool
Return whether two ranges overlaps.
- Parameters
other – other range to be compared with
- Returns
True if the two ranges intersect, False otherwise
- abstract range(freq='H') List[pandas._libs.tslibs.timestamps.Timestamp]
Return list of timestamps, spaced by given frequency.
- Parameters
freq – frequency of timestamps, valid values are “D” (day), “H” (hours), “M”(minute), “S” (seconds).
- Returns
list of timestamps
- abstract property start: pandas._libs.tslibs.timestamps.Timestamp
Return the first timestamp.
- Returns
Timestamp
- class cgnal.core.data.model.core.CachedIterable(items: Sequence[cgnal.core.data.model.core.T])
Bases:
cgnal.core.data.model.core.BaseIterable
[cgnal.core.data.model.core.T
]Base class to be used for implementing cached iterables.
Return instance of a class to be used for implementing cached iterables.
- Parameters
items – sequence or iterable of elements
- property cached: bool
Whether the iterable is cached in memory or lazy.
- Returns
boolean indicating whether iterable is fully-stored in memory
- classmethod empty() cgnal.core.data.model.core.CachedIterableType
Return an empty cached iterable.
- Returns
Empty instance
- classmethod from_iterable(iterable: cgnal.core.data.model.core.BaseIterable[cgnal.core.data.model.core.T]) cgnal.core.data.model.core.CachedIterableType
Create a new instance of this class from a BaseIterable instance.
- Parameters
iterable – iterable instance
- Returns
cached iterable
- property items: Sequence[cgnal.core.data.model.core.T]
Return an iterator over the items.
- Returns
Iterable[T]
- class cgnal.core.data.model.core.CompositeRange(ranges: List[cgnal.core.data.model.core.Range])
Bases:
cgnal.core.data.model.core.BaseRange
Class representing a composition of ranges.
Return a range made up of multiple ranges.
- Parameters
ranges – List of Ranges
- property end: pandas._libs.tslibs.timestamps.Timestamp
Return the last timestamp.
- Returns
Timestamp
- overlaps(other: cgnal.core.data.model.core.BaseRange) bool
Return whether two ranges overlaps.
- Parameters
other – BaseRange, other range to be compared with
- Returns
bool, True if the two ranges intersect, False otherwise
- range(freq='H') List[pandas._libs.tslibs.timestamps.Timestamp]
Return list of timestamps, spaced by given frequency.
- Parameters
freq – given frequency
- Returns
list of timestamps
- simplify() Union[cgnal.core.data.model.core.CompositeRange, cgnal.core.data.model.core.Range]
Simplify the list into disjoint Range objects, aggregating non-disjoint ranges.
If only one range would be present, a simple Range object is returned.
- Returns
BaseRange
- property start: pandas._libs.tslibs.timestamps.Timestamp
Return the first timestamp.
- Returns
Timestamp
- class cgnal.core.data.model.core.DillSerialization
Bases:
cgnal.core.data.model.core.Serializable
Serialization based on dill package.
- classmethod load(filename: Union[str, os.PathLike[str]]) DillSerialization
Load instance from file.
- Parameters
filename – Name of the file to be read
- Returns
Instance of the read Model
- write(filename: Union[str, os.PathLike[str]]) None
Write instance as pickle.
- Parameters
filename – Name of the file where to save the instance
- class cgnal.core.data.model.core.IterGenerator(generator_function: Callable[[], Iterator[cgnal.core.data.model.core.T]], _type: Optional[Type[cgnal.core.data.model.core.T]] = None)
Bases:
Generic
[cgnal.core.data.model.core.T
]Base class representing any generator.
Class that allows a given generator to be accessed as an Iterator via .iterator property.
- Parameters
generator_function – function that outputs a generator
_type – type returned by the generartor, required when the generator is empty
- Raises
TypeError – when type mismatch happens between generator and provided type
ValueError – when an empty generator is provided without _type specification
- property iterator: Iterator[cgnal.core.data.model.core.T]
Return an iterator over the given generator function.
- Returns
an iterator
- class cgnal.core.data.model.core.IterableUtilsMixin(*args, **kwargs)
Bases:
Generic
[cgnal.core.data.model.core.T
,cgnal.core.data.model.core.LazyIterableType
,cgnal.core.data.model.core.CachedIterableType
],cgnal.core.data.model.core.BaseIterable
[cgnal.core.data.model.core.T
],abc.ABC
Class to provide base interfaces and methods for enhancing iterables classes and enable more functional approaches.
In particular, the class provides among others implementation for map, filter and foreach methods.
Create a new instance of this class.
- Parameters
cls – parent object class
args – passed to the super class __new__ method
kwargs – passed to the super class __new__ method
- Raises
RuntimeError – if the cached and lazy versions were not defined before instantiating the class
- Returns
an instance of this class
- batch(size: int = 100) Iterator[cgnal.core.data.model.core.CachedIterableType]
Return an iterator of batches of size size.
- Parameters
size – dimension of the batch
- Yield
iterator of batches
- cached_type: Type[cgnal.core.data.model.core.CachedIterableType]
- filter(f: Callable[[cgnal.core.data.model.core.T], bool]) cgnal.core.data.model.core.LazyIterableType
Return an iterable where elements have been filtered based on a boolean function.
- Parameters
f – boolean function that selects items
- Returns
lazy iterable with elements filtered
- foreach(f: Callable[[cgnal.core.data.model.core.T], Any])
Execute the provided function on each element of the iterable.
- Parameters
f – function to be executed for each element
- from_element(value: cgnal.core.data.model.core.T, cached=True) Union[cgnal.core.data.model.core.LazyIterableType, cgnal.core.data.model.core.CachedIterableType]
Instantiate a new object of this class from a single element.
- Parameters
value – element
cached – whether a cached iterable should be returned, defaults to True
- Returns
iterable object
- lazy_type: Type[cgnal.core.data.model.core.LazyIterableType]
- map(f: Callable[[cgnal.core.data.model.core.T], cgnal.core.typing.T_co]) cgnal.core.data.model.core.LazyIterableType
Map all elements of an iterable with the provided function.
- Parameters
f – function to be used to map the elements
- Returns
mapped iterable
- take(size: int) cgnal.core.data.model.core.CachedIterableType
Take the first n elements of the iterables.
- Parameters
size – number of elements to be taken
- Returns
cached iterable with the first elements
- to_cached() cgnal.core.data.model.core.CachedIterableType
Create a new cached instance of this instance.
- Returns
cached iterable
- to_lazy() cgnal.core.data.model.core.LazyIterableType
Create a new lazy instance of this instance.
- Returns
lazy iterable
- class cgnal.core.data.model.core.LazyIterable(items: cgnal.core.data.model.core.IterGenerator)
Bases:
cgnal.core.data.model.core.BaseIterable
[cgnal.core.data.model.core.T
]Base class to be used for implementing lazy iterables.
Return an instance of the class to be used for implementing lazy iterables.
- Parameters
items – IterGenerator containing the generator of items
- property cached: bool
Whether the iterable is cached in memory or lazy.
- Returns
boolean indicating whether iterable is fully-stored in memory
- classmethod empty() cgnal.core.data.model.core.LazyIterableType
Return an empty lazy iterable.
- Returns
Empty instance
- classmethod from_iterable(iterable: cgnal.core.data.model.core.BaseIterable[cgnal.core.data.model.core.T]) cgnal.core.data.model.core.LazyIterableType
Create a new instance of this class from a BaseIterable instance.
- Parameters
iterable – iterable instance
- Returns
lazy iterable
- property items: Iterator[cgnal.core.data.model.core.T]
Return an iterator over the items.
- Returns
Iterable[T]
- class cgnal.core.data.model.core.PickleSerialization
Bases:
cgnal.core.data.model.core.Serializable
Serialization based on pickle package.
- classmethod load(filename: Union[str, os.PathLike[str]]) PickleSerialization
Load instance from pickle.
- Parameters
filename – Name of the file to be read
- Returns
Instance of the read Model
- write(filename: Union[str, os.PathLike[str]]) None
Write instance as pickle.
- Parameters
filename – Name of the file where to save the instance
- class cgnal.core.data.model.core.Range(start: pandas.core.tools.datetimes.DatetimeScalar, end: pandas.core.tools.datetimes.DatetimeScalar)
Bases:
cgnal.core.data.model.core.BaseRange
Base class for a continuous range.
Return a simple Range Class.
- Parameters
start – starting datetime for the range
end – ending datetime for the range
- Raises
ValueError – if start > end
- property end: pandas._libs.tslibs.timestamps.Timestamp
Return the last timestamp.
- Returns
Timestamp
- overlaps(other: cgnal.core.data.model.core.BaseRange) bool
Return whether two ranges overlaps.
- Parameters
other – other range to be compared with
- Returns
True or False whether the two overlaps
- range(freq='H') List[pandas._libs.tslibs.timestamps.Timestamp]
Return list of timestamps, spaced by given frequency.
- Parameters
freq – given frequency
- Returns
list of timestamps
- property start: pandas._libs.tslibs.timestamps.Timestamp
Return the first timestamp.
- Returns
Timestamp
- class cgnal.core.data.model.core.RegisterLazyCachedIterables(class_object_first: Type[cgnal.core.data.model.core.IterableUtilsMixin], unidirectional_link: bool = False)
Bases:
object
Register the lazy and cached version of the iterables.
Initialize an instance of this class.
- Parameters
class_object_first – the first iterable class object (the lazy or chached version)
unidirectional_link – if True, only set the link in the second class passed to the __call__ method
- static register_cached(class_object_lazy: Type[cgnal.core.data.model.core.IterableUtilsMixin], class_object_cached: Type[cgnal.core.data.model.core.IterableUtilsMixin])
Link the lazy and cached versions.
- Parameters
class_object_lazy – the lazy iterable class object
class_object_cached – the chached iterable class object
- static register_lazy(class_object_lazy: Type[cgnal.core.data.model.core.IterableUtilsMixin], class_object_cached: Type[cgnal.core.data.model.core.IterableUtilsMixin])
Link the lazy and cached versions.
- Parameters
class_object_lazy – the lazy iterable class object
class_object_cached – the chached iterable class object
- class cgnal.core.data.model.core.Serializable
Bases:
abc.ABC
Abstract Class to be used to extend objects that can be serialised.
- abstract classmethod load(filename: Union[str, os.PathLike[str]]) Serializable
Load class from a file.
- Parameters
filename – filename
- abstract write(filename: Union[str, os.PathLike[str]]) None
Write class to a file.
- Parameters
filename – filename
cgnal.core.data.model.ml module
Module for specifying data-models to be used in modelling.
- class cgnal.core.data.model.ml.CachedDataset(*args, **kwargs)
Bases:
cgnal.core.data.model.ml.DatasetUtilsMixin
[cgnal.core.data.model.ml.FeatType
,cgnal.core.data.model.ml.LabType
],cgnal.core.data.model.core.CachedIterable
[cgnal.core.data.model.ml.Sample
[cgnal.core.data.model.ml.FeatType
,cgnal.core.data.model.ml.LabType
]],cgnal.core.data.model.core.DillSerialization
Class that represents dataset cached in-memory, derived by a cached iterables of samples.
Return instance of a class to be used for implementing cached iterables.
- Parameters
items – sequence or iterable of elements
- cached_type
- lazy_type
alias of
cgnal.core.data.model.ml.LazyDataset
- to_df() pandas.core.frame.DataFrame
Reformat the Features and Labels as a DataFrame.
- Returns
DataFrame, Dataframe with features and labels
- class cgnal.core.data.model.ml.DatasetUtilsMixin(*args, **kwargs)
Bases:
cgnal.core.data.model.core.IterableUtilsMixin
[cgnal.core.data.model.ml.Sample
[cgnal.core.data.model.ml.FeatType
,cgnal.core.data.model.ml.LabType
],LazyDataset[FeatType, LabType]
,CachedDataset[FeatType, LabType]
],Generic
[cgnal.core.data.model.ml.FeatType
,cgnal.core.data.model.ml.LabType
],abc.ABC
Base class for representing datasets as iterable over Samples.
Create a new instance of this class.
- Parameters
cls – parent object class
args – passed to the super class __new__ method
kwargs – passed to the super class __new__ method
- Raises
RuntimeError – if the cached and lazy versions were not defined before instantiating the class
- Returns
an instance of this class
- property asPandasDataset: cgnal.core.data.model.ml.PandasDataset
Cast object as a PandasDataset.
- Returns
dataset
- cached_type: Type[cgnal.core.data.model.core.CachedIterableType]
- static checkNames(x: Optional[Union[int, str, Any]]) Union[str, int]
Check that feature names comply with format and cast them to either string or int.
- Parameters
x – feature name
- Returns
name as int or str
- Raises
AttributeError – if x is none
- getFeaturesAs(type: typing_extensions.Literal[array]) numpy.ndarray
- getFeaturesAs(type: typing_extensions.Literal[pandas]) pandas.core.frame.DataFrame
- getFeaturesAs(type: typing_extensions.Literal[dict]) Dict[Union[str, int], cgnal.core.data.model.ml.FeatType]
- getFeaturesAs(type: typing_extensions.Literal[list]) List[cgnal.core.data.model.ml.FeatType]
- getFeaturesAs(type: typing_extensions.Literal[lazy]) Iterator[cgnal.core.data.model.ml.FeatType]
Return object of the specified type containing the feature space.
- Parameters
type – type of return. Can be one of “pandas”, “dict”, “list” or “array
- Returns
an object of the specified type containing the features
- Raises
ValueError – if the provided type is not one of the allowed ones
- getLabelsAs(type: typing_extensions.Literal[array]) numpy.ndarray
- getLabelsAs(type: typing_extensions.Literal[pandas]) pandas.core.frame.DataFrame
- getLabelsAs(type: typing_extensions.Literal[dict]) Dict[Union[str, int], cgnal.core.data.model.ml.LabType]
- getLabelsAs(type: typing_extensions.Literal[list]) List[cgnal.core.data.model.ml.LabType]
- getLabelsAs(type: typing_extensions.Literal[lazy]) Iterator[cgnal.core.data.model.ml.LabType]
Return an object of the specified type containing the labels.
- Parameters
type – type of return. Can be one of “pandas”, “dict”, “list” or “array
- Returns
an object of the specified type containing the features
- Raises
ValueError – if the provided type is not one of the allowed ones
- lazy_type: Type[cgnal.core.data.model.core.LazyIterableType]
- type()
Return the type of the objects in the Iterable.
- Returns
type of the object of the iterable
- union(other: cgnal.core.data.model.core.BaseIterable[cgnal.core.data.model.ml.Sample[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType]]) cgnal.core.data.model.ml.DatasetUtilsMixin[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType]
Return a union of datasets.
- Parameters
other – Dataset
- Returns
LazyDataset
- Raises
TypeError – other is not an instance of Dataset
- class cgnal.core.data.model.ml.LazyDataset(*args, **kwargs)
Bases:
cgnal.core.data.model.core.LazyIterable
[cgnal.core.data.model.ml.Sample
[cgnal.core.data.model.ml.FeatType
,cgnal.core.data.model.ml.LabType
]],cgnal.core.data.model.ml.DatasetUtilsMixin
[cgnal.core.data.model.ml.FeatType
,cgnal.core.data.model.ml.LabType
]Class that represents dataset derived by a lazy iterable of samples.
Return an instance of the class to be used for implementing lazy iterables.
- Parameters
items – IterGenerator containing the generator of items
- cached_type
- features() Iterator[cgnal.core.data.model.ml.FeatType]
Return an iterator over sample features.
- Returns
iterable of features
- getFeaturesAs(type: typing_extensions.Literal[array]) numpy.ndarray
- getFeaturesAs(type: typing_extensions.Literal[pandas]) pandas.core.frame.DataFrame
- getFeaturesAs(type: typing_extensions.Literal[dict]) Dict[Union[str, int], cgnal.core.data.model.ml.FeatType]
- getFeaturesAs(type: typing_extensions.Literal[list]) List[cgnal.core.data.model.ml.FeatType]
- getFeaturesAs(type: typing_extensions.Literal[lazy]) Iterator[cgnal.core.data.model.ml.FeatType]
Return object of the specified type containing the feature space.
- Parameters
type – type of return. Can be one of “pandas”, “dict”, “list” or “array
- Returns
an object of the specified type containing the features
- getLabelsAs(type: typing_extensions.Literal[array]) numpy.ndarray
- getLabelsAs(type: typing_extensions.Literal[pandas]) pandas.core.frame.DataFrame
- getLabelsAs(type: typing_extensions.Literal[dict]) Dict[Union[str, int], cgnal.core.data.model.ml.LabType]
- getLabelsAs(type: typing_extensions.Literal[list]) List[cgnal.core.data.model.ml.LabType]
- getLabelsAs(type: typing_extensions.Literal[lazy]) Iterator[cgnal.core.data.model.ml.LabType]
Return an object of the specified type containing the labels.
- Parameters
type – type of return. Can be one of “pandas”, “dict”, “list”, “array” or iterators
- Returns
an object of the specified type containing the features
- labels() Iterator[cgnal.core.data.model.ml.LabType]
Return an iterator over sample labels.
- Returns
iterable of labels
- lazy_type
alias of
cgnal.core.data.model.ml.LazyDataset
- withLookback(lookback: int) cgnal.core.data.model.ml.LazyDataset
Create a LazyDataset with features that are an array of
lookback
lists of samples’ features.- Parameters
lookback – number of samples’ features to look at
- Returns
LazyDataset
with changed samples
- class cgnal.core.data.model.ml.MultiFeatureSample(features: List[numpy.ndarray], label: Optional[cgnal.core.data.model.ml.LabType] = None, name: Optional[str] = None)
Bases:
cgnal.core.data.model.ml.Sample
[List
[numpy.ndarray
],cgnal.core.data.model.ml.LabType
]Class representing an observation defined by a nested list of arrays.
Object representing a single sample of a training or test set.
- Parameters
features – features of the sample
label – labels of the sample (optional)
name – id of the sample (optional)
- class cgnal.core.data.model.ml.PandasDataset(*args, **kwargs)
Bases:
Generic
[cgnal.core.data.model.ml.FeatType
,cgnal.core.data.model.ml.LabType
],cgnal.core.data.model.ml.DatasetUtilsMixin
[cgnal.core.data.model.ml.FeatType
,cgnal.core.data.model.ml.LabType
],cgnal.core.data.model.core.DillSerialization
Dataset represented via pandas Dataframes for features and labels.
Return a datastructure built on top of pandas dataframes.
The PandasDataFrame allows to pack features and labels together and obtain features and labels as a pandas dataframe, numpy array or a dictionary. For unsupervised learning tasks the labels are left as None.
- Parameters
features – a dataframe or a series of features
labels – a dataframe or a series of labels. None in case no labels are present.
- Raises
TypeError – if the labels or features are not DataFrames nor Series
- property cached: bool
Return whether the dataset is cached or not in memory.
- Returns
boolean
- cached_type
- classmethod createObject(features: Union[pandas.core.frame.DataFrame, pandas.core.series.Series], labels: Optional[Union[pandas.core.frame.DataFrame, pandas.core.series.Series]]) cgnal.core.data.model.ml.TPandasDataset
Create a PandasDataset object.
- Parameters
features – features as pandas dataframe/series
labels – labels as pandas dataframe/series
- Returns
a
PandasDataset
object
- dropna(**kwargs) cgnal.core.data.model.ml.TPandasDataset
Drop NAs from feature and labels.
- Parameters
kwargs – keyworded arguments are passed to dropna
- Returns
PandasDataset
with features and labels without NAs
- classmethod empty() cgnal.core.data.model.ml.TPandasDataset
Return empty object.
- Returns
Empty instance of class
- property features: pandas.core.frame.DataFrame
Get features as pandas dataframe.
- Returns
pd.DataFrame
- classmethod from_sequence(datasets: Sequence[cgnal.core.data.model.ml.TPandasDataset]) cgnal.core.data.model.ml.TPandasDataset
Create a PandasDataset from a list of pandas datasets using pd.concat.
- Parameters
datasets – list of PandasDatasets
- Returns
PandasDataset
- getFeaturesAs(type: typing_extensions.Literal[array]) numpy.ndarray
- getFeaturesAs(type: typing_extensions.Literal[pandas]) pandas.core.frame.DataFrame
- getFeaturesAs(type: typing_extensions.Literal[dict]) Dict[Union[str, int], cgnal.core.data.model.ml.FeatType]
- getFeaturesAs(type: typing_extensions.Literal[list]) List[cgnal.core.data.model.ml.FeatType]
- getFeaturesAs(type: typing_extensions.Literal[lazy]) Iterator[cgnal.core.data.model.ml.FeatType]
Get features as numpy array, pandas dataframe or dictionary.
- Parameters
type – str, default is ‘array’, can be ‘array’,’pandas’,’dict’
- Returns
features according to the given type
- Raises
ValueError – provided type not allowed
- getLabelsAs(type: typing_extensions.Literal[array]) numpy.ndarray
- getLabelsAs(type: typing_extensions.Literal[pandas]) pandas.core.frame.DataFrame
- getLabelsAs(type: typing_extensions.Literal[dict]) Dict[Union[str, int], cgnal.core.data.model.ml.LabType]
- getLabelsAs(type: typing_extensions.Literal[list]) List[cgnal.core.data.model.ml.LabType]
- getLabelsAs(type: typing_extensions.Literal[lazy]) Iterator[cgnal.core.data.model.ml.LabType]
Get labels as numpy array, pandas dataframe or dictionary.
- Parameters
type – str, default is ‘array’, can be ‘array’,’pandas’,’dict’
- Returns
labels according to the given type
- Raises
ValueError – provided type not allowed
- property index: pandas.core.indexes.base.Index
Get Dataset index.
- Returns
pd.Index
- intersection() cgnal.core.data.model.ml.TPandasDataset
Intersect feature and labels indices.
- Returns
PandasDataset
with features and labels with intersected indices
- property items: Iterator[cgnal.core.data.model.ml.Sample]
Get features as an iterator of Samples.
- Yield
Iterator of objects of
cgnal.data.model.ml.Sample
- property labels: pandas.core.frame.DataFrame
Get labels as a pandas dataframe.
- Returns
pd.DataFrame
- lazy_type
alias of
cgnal.core.data.model.ml.LazyDataset
- loc(idx: List[Any]) cgnal.core.data.model.ml.TPandasDataset
Find given indices in features and labels.
- Parameters
idx – input indices
- Returns
PandasDataset
with features and labels filtered on input indices
- takeAsPandas(n: int) cgnal.core.data.model.ml.TPandasDataset
Return top n records as a PandasDataset.
- Parameters
n – int specifying number of records to output
- Returns
PandasDataset
of length n
- union(other: cgnal.core.data.model.core.BaseIterable[cgnal.core.data.model.ml.Sample[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType]]) cgnal.core.data.model.ml.DatasetUtilsMixin[cgnal.core.data.model.ml.FeatType, cgnal.core.data.model.ml.LabType]
Return a union between datasets.
- Parameters
other – Dataset to be merged
- Returns
Dataset resulting from the merge
- class cgnal.core.data.model.ml.PandasTimeIndexedDataset(*args, **kwargs)
Bases:
cgnal.core.data.model.ml.PandasDataset
Class to be used for datasets that have time-indexed samples.
Return a datastructure built on top of pandas dataframes that allows to pack features and labels that are time indexed.
Features and labels can be obtained as a pandas dataframe, numpy array or a dictionary. For unsupervised learning tasks the labels are left as None.
- Parameters
features – pandas dataframe/series where index elements are dates in string format
labels – pandas dataframe/series where index elements are dates in string format
- class cgnal.core.data.model.ml.Sample(features: cgnal.core.data.model.ml.FeatType, label: Optional[cgnal.core.data.model.ml.LabType] = None, name: Optional[Union[int, str, Any]] = None)
Bases:
cgnal.core.data.model.core.DillSerialization
,Generic
[cgnal.core.data.model.ml.FeatType
,cgnal.core.data.model.ml.LabType
]Base class for representing a sample/observation.
Return an object representing a single sample of a training or test set.
- Parameters
features – features of the sample
label – labels of the sample (optional)
name – id of the sample (optional)
- cgnal.core.data.model.ml.features_and_labels_to_dataset(X: Union[pandas.core.frame.DataFrame, pandas.core.series.Series], y: Optional[Union[pandas.core.frame.DataFrame, pandas.core.series.Series]] = None) cgnal.core.data.model.ml.CachedDataset
Pack features and labels into a CachedDataset.
- Parameters
X – features which can be a pandas dataframe or a pandas series object
y – labels which can be a pandas dataframe or a pandas series object
- Returns
an instance of
cgnal.data.model.ml.CachedDataset
cgnal.core.data.model.text module
Module for providing abstraction and classes for handling NLP data.
- class cgnal.core.data.model.text.CachedDocuments(*args, **kwargs)
Bases:
cgnal.core.data.model.core.CachedIterable
[cgnal.core.data.model.text.Document
],cgnal.core.data.model.text.DocumentsUtilsMixin
,cgnal.core.data.model.core.DillSerialization
Class representing a collection of documents cached in memory.
Return instance of a class to be used for implementing cached iterables.
- Parameters
items – sequence or iterable of elements
- cached_type
- lazy_type
- to_df(fields: Optional[List[str]] = None) pandas.core.frame.DataFrame
Represent the corpus of documents as a table by unpacking provided fields as columns.
- Parameters
fields – Name of the document property to be unpacked as columns
- Returns
dataframe representing the corpus with the given fields
- class cgnal.core.data.model.text.Document(uuid: cgnal.core.data.model.text.K, data: Dict[str, Any])
Bases:
Generic
[cgnal.core.data.model.text.K
]Document representation as couple of uuid and dictionary of information.
Return instance of a document.
- Parameters
uuid – document id
data – document data as a dictionary
- addProperty(key: str, value: Any) cgnal.core.data.model.text.Document
Generate new Document instance with given new data element.
- Parameters
key – key of the data element to add
value – value of the data element to add
- Returns
Document with new given data element
- property author: Optional[str]
Retrieve ‘author’ field.
- Returns
author data field value
- getOrThrow(key: str, default: Optional[Any] = None) Optional[Any]
Retrieve value associated to given key or return default value.
- Parameters
key – key to retrieve
default – default value to return
- Returns
retrieve element
- Raises
KeyError – if key not found and default not provided
- items() Iterator[Tuple[str, Any]]
Yield data items.
- Yield
iterator with tuples of data properties names and values
- property language: Optional[str]
Retrieve ‘language’ field.
- Returns
language data field value
- property properties: Iterator[str]
Yield data properties names.
- Yield
iterator with data properties names
- removeProperty(key: str) cgnal.core.data.model.text.Document
Generate new Document instance without given data element.
- Parameters
key – key of data element to remove
- Returns
Document without given data element
- setRandomUUID() cgnal.core.data.model.text.Document
Generate new document instance with the same data as the current one but with random uuid.
- Returns
Document instance with the same data as the current one but with random uuid
- property text: Optional[str]
Retrieve ‘text’ field.
- Returns
text data field value
- class cgnal.core.data.model.text.DocumentsUtilsMixin(*args, **kwargs)
Bases:
cgnal.core.data.model.core.IterableUtilsMixin
[cgnal.core.data.model.text.Document
,LazyDocuments
,CachedDocuments
]Utilities for Documents iterables.
Create a new instance of this class.
- Parameters
cls – parent object class
args – passed to the super class __new__ method
kwargs – passed to the super class __new__ method
- Raises
RuntimeError – if the cached and lazy versions were not defined before instantiating the class
- Returns
an instance of this class
- cached_type: Type[cgnal.core.data.model.core.CachedIterableType]
- lazy_type: Type[cgnal.core.data.model.core.LazyIterableType]
- property type: Type[cgnal.core.data.model.text.Document]
Return the type of the objects in the Iterable.
- Returns
Document class object
- class cgnal.core.data.model.text.LazyDocuments(*args, **kwargs)
Bases:
cgnal.core.data.model.core.LazyIterable
[cgnal.core.data.model.text.Document
],cgnal.core.data.model.text.DocumentsUtilsMixin
Class representing a collection of documents provided by a generator.
Return an instance of the class to be used for implementing lazy iterables.
- Parameters
items – IterGenerator containing the generator of items
- cached_type
- lazy_type
- cgnal.core.data.model.text.generate_random_uuid() bytes
Create a random number with 12 digits.
- Returns
uuid
Module contents
Data model module.
In this module the data types used in CGnal framework are defined.