cgnal.core.data.layer.pandas package
Submodules
cgnal.core.data.layer.pandas.archivers module
Module with abstraction for accessing to data persistent in pickles, mimicking a ficticious database.
- class cgnal.core.data.layer.pandas.archivers.CsvArchiver(filename: Union[str, os.PathLike[str]], dao: cgnal.core.data.layer.DAO[cgnal.core.typing.T, pandas.core.series.Series], sep: str = ';')
Bases:
cgnal.core.data.layer.pandas.archivers.PandasArchiver
[cgnal.core.typing.T
]Archiver based on persistent layers based on tabular files stored on csv.
Create an in-memory archiver based on structured data stored in the filesystem as a CSV.
- Parameters
filename – str, path object or file like object. Any valid string path to a csv file.
dao – An instance of
cgnal.data.layer.pandas.dao.DocumentDao/SeriesDAO/DataFrameDAO
that helps to retrieve/archive a pd.Dataframe row.sep – str, default ‘;’. Delimiter to use
- Raises
TypeError – if filename is not a string
- class cgnal.core.data.layer.pandas.archivers.PandasArchiver(dao: cgnal.core.data.layer.DAO[cgnal.core.typing.T, pandas.core.series.Series])
Bases:
cgnal.core.data.layer.Archiver
[cgnal.core.typing.T
],abc.ABC
Archiver based on persistent layers based on tabular files, represented in memory by a pandas DataFrame.
Create an in-memory archiver based on structured data stored as a pandas DataFrame.
- Parameters
dao – An instance of
cgnal.data.layer.pandas.dao.DocumentDao/SeriesDAO/DataFrameDAO
that helps to retrieve/archive a pd.DataFrame row- Raises
TypeError – if given DAO is not of the correct type
- archive(objs: Union[Iterable[cgnal.core.typing.T], cgnal.core.typing.T])
Insert one or more objects in the underlying pd.DataFrame object.
- Parameters
objs – object or list of objects to be archived. The objects can be of the
cgnal.data.model.text.Document
, pd.DataFrame or pd.Series- Returns
self i.e. an instance of
PandasArchiver
with updated self.data object
- archiveMany(objs: Iterable[cgnal.core.typing.T]) cgnal.core.data.layer.pandas.archivers.PandasArchiver
Insert many objects of type Document/pd.DataFrame/pd.Series in a pd.DataFrame.
- Parameters
objs – List of objects to be inserted. The objects can be of the following class instances
cgnal.data.model.text.Document
, pd.DataFrame or pd.Series- Returns
self i.e. an instance of
PandasArchiver
with updated self.data object
- archiveOne(obj: cgnal.core.typing.T)
Insert an object of type Document/pd.DataFrame/pd.Series in a pd.DataFrame.
- Parameters
obj – An instance of
cgnal.data.model.text.Document, pd.DataFrame or pd.Series
- Returns
self i.e. an instance of
PandasArchiver
with updated self.data object
- commit() cgnal.core.data.layer.pandas.archivers.PandasArchiver
Persist data stored in memory in the file.
- Returns
self
- property data: pandas.core.frame.DataFrame
Return tabular data stored in memory.
- Returns
pd.DataFrame
- retrieve(condition: Optional[Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]] = None, sort_by: Optional[Union[str, List[str]]] = None) Iterator[cgnal.core.typing.T]
Retrieve rows satisfying condition, sorted according to given ordering.
- Parameters
condition – condition to satisfy. If None return all rows.
sort_by – ordering to respect. If None, no ordering is given.
- Returns
iterator of (ordered) rows satisfying given condition
- retrieveById(uuid: pandas.core.indexes.base.Index) cgnal.core.typing.T
Retrive row from a dataframe by id.
- Parameters
uuid – row id
- Returns
retrieved row parsed according to self.dao
- retrieveGenerator(condition: Optional[Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]] = None, sort_by: Optional[Union[str, List[str]]] = None) cgnal.core.data.model.core.IterGenerator[cgnal.core.typing.T]
Retrieve a generator of rows satisfying condition, sorted according to given ordering.
- Parameters
condition – condition to satisfy. If None return all rows.
sort_by – ordering to respect. If None, no ordering is given.
- Returns
generator of rows satisfying given condition. The generator is of the the type
cgnal.data.model.core.IterGenerator
(ordered)
- class cgnal.core.data.layer.pandas.archivers.PickleArchiver(filename: Union[str, os.PathLike[str]], dao: cgnal.core.data.layer.DAO[cgnal.core.typing.T, pandas.core.series.Series])
Bases:
cgnal.core.data.layer.pandas.archivers.PandasArchiver
[cgnal.core.typing.T
]Archiver based on persistent layers based on tabular files stored on a pickle.
Create an in-memory archiver based on structured data stored in the filesystem as a Pickle.
- Parameters
filename – str, path object or file like object. Any valid string path to a pickle file.
dao – An instance of
cgnal.data.layer.pandas.dao.DocumentDao/SeriesDAO/DataFrameDAO
that helps to retrieve/archive a pd.Dataframe row.
- Raises
TypeError – if filename is not a string
- class cgnal.core.data.layer.pandas.archivers.TableArchiver(table: cgnal.core.data.layer.pandas.databases.Table, dao: cgnal.core.data.layer.DAO[cgnal.core.typing.T, pandas.core.series.Series])
Bases:
cgnal.core.data.layer.pandas.archivers.PandasArchiver
[cgnal.core.typing.T
]Archiver based on persistent layers based on tabular files stored on a table in a database.
Create an in-memory archiver based on structured data stored as a table.
- Parameters
table – An instance of
cgnal.data.layer.pandas.databases.Table
dao – An instance of
cgnal.data.layer.pandas.dao.DocumentDao/SeriesDAO/DataFrameDAO
that helps to retrieve/archive a pd.Dataframe row.
cgnal.core.data.layer.pandas.dao module
Module with the implementation and abstraction for serializing/deserializing objects into DataFrames.
- class cgnal.core.data.layer.pandas.dao.DataFrameDAO(*args, **kwds)
Bases:
cgnal.core.data.layer.DAO
[pandas.core.frame.DataFrame
,pandas.core.series.Series
]Data Access Object for pd.DataFrames.
- static addName(df: pandas.core.frame.DataFrame, name: Optional[Hashable]) pandas.core.frame.DataFrame
Add name to the input dataframe.
- Parameters
df – pd.DataFrame
name – str
- Returns
pd.DataFrame
- computeKey(df: pandas.core.frame.DataFrame) Hashable
Get dataframe name.
- Parameters
df – pd.DataFrame. A pandas dataframe
- Returns
str, name of the dataframe
- get(df: pandas.core.frame.DataFrame) pandas.core.series.Series
Get dataframe as pd.Series.
- Parameters
df – pd.DataFrame. A pandas dataframe
- Returns
pd.Series
- parse(row: pandas.core.series.Series) pandas.core.frame.DataFrame
Get a row i.e. pd.Series as a pandas DataFrame.
- Parameters
row – pd.Series, row of a pd.DataFrame
- Returns
pd.DataFrame, a pandas dataframe object
- class cgnal.core.data.layer.pandas.dao.DocumentDAO(*args, **kwds)
Bases:
cgnal.core.data.layer.DAO
[cgnal.core.data.model.text.Document
,pandas.core.series.Series
]Data access object for documents.
- computeKey(doc: cgnal.core.data.model.text.Document) Hashable
Get document id.
- Parameters
doc – an instance of
cgnal.data.model.text.Document
- Returns
uuid i.e. id of the given document
- get(doc: cgnal.core.data.model.text.Document) pandas.core.series.Series
Get doc as pd.Series with uuid as name.
- Parameters
doc – an instance of
cgnal.data.model.text.Document
- Returns
pd.Series
- parse(row: pandas.core.series.Series) cgnal.core.data.model.text.Document
Get a row i.e. pd.Series as a Document.
- Parameters
row – pd.Series, row of a pd.DataFrame
- Returns
cgnal.data.model.text.Document
, a Document object
- class cgnal.core.data.layer.pandas.dao.SeriesDAO(mapping: Optional[Dict[Hashable, Hashable]] = None, keys: Optional[Sequence[Hashable]] = None)
Bases:
cgnal.core.data.layer.DAO
[pandas.core.series.Series
,pandas.core.series.Series
]Data Access Object for pd.Series.
Create a Data Access Object for pd.Series.
- Parameters
mapping – mapping of names between the file and the Series. The value of the key represent the name in the file, the value represent the name we want to have in the Series
keys – which of the fields in the Series should be used as keys
- computeKey(serie) Union[Hashable, Dict]
Get series name.
- Parameters
serie – pd.Series
- Returns
dict representing the key
- Raises
ValueError – keys and series name have different dimensions
- get(serie: pandas.core.series.Series) pandas.core.series.Series
Get a series as series object.
- Parameters
serie – pd.Series
- Returns
pd.Series
- property inverseMapping: Dict[Hashable, Hashable]
Return the inverse mapping between field names.
- Returns
dict with inverse mapping
- Raises
ValueError – mapping is not invertible because of duplicated values
- parse(row: pandas.core.series.Series) pandas.core.series.Series
Get a row as a pd.Series object.
- Parameters
row – pd.Series
- Returns
pd.Series
cgnal.core.data.layer.pandas.databases module
Module with abstraction for databases and tables.
- class cgnal.core.data.layer.pandas.databases.Database(name: Union[str, os.PathLike[str]], extension: str = '.p')
Bases:
cgnal.core.logging.defaults.WithLogging
,cgnal.core.data.layer.DatabaseABC
Class representing a Database object.
Return an instance of a class implementing standard read and write methods to pickle data sources.
- Parameters
name – path to pickles
extension – standard pickle extension
- table(table_name: str) cgnal.core.data.layer.pandas.databases.Table
Select table.
- Parameters
table_name – name of the table
- Returns
object of class PickleTable
- property tables: List[str]
Complete pickle names with appropriate extension.
- Returns
pickle names with appropriate extensions
- class cgnal.core.data.layer.pandas.databases.Table(db: cgnal.core.data.layer.pandas.databases.Database, table_name: str)
Bases:
cgnal.core.logging.defaults.WithLogging
,cgnal.core.data.layer.TableABC
Class representing a Table in a Database.
Implement a constructor for tables using pickle file format.
- Parameters
db – database to which the table belongs
table_name – name of the table
- Raises
TypeError – if the provided database is not an instance of Database
- property data: pandas.core.frame.DataFrame
Read pickle.
- Returns
pd.DataFrame or pd.Series read from pickle
- property filename: Union[str, os.PathLike[str]]
Return path to pickle.
- Returns
path to pickle file
- to_df(query: Optional[str] = None) pandas.core.frame.DataFrame
Read pickle.
- Parameters
query – query
- Returns
pd.DataFrame or pd.Series read from pickle
- write(df: pandas.core.frame.DataFrame, overwrite: bool = False) None
Write pickle of data, eventually outer joined with an input DataFrame.
- Parameters
df – input data
overwrite – whether or not to overwrite existing file
Module contents
Data layer for pandas integration.