GraphDataModeler

from neo4j_runway import GraphDataModeler

This class is responsible for generating a graph data model via communication with an LLM. It handles prompt generation, model generation history as well as access to the generated data models.

 Attributes
----------
llm : BaseLLM
    The LLM used to generate data models.
discovery : Union[str, Discovery], optional
    Either a string containing the LLM generated
    discovery or a Discovery object that has been run.
user_input : Union[Dict[str, str], UserInput], optional
    Either a dictionary with keys general_description
    and column names with descriptions or a UserInput
    object.
model_iterations : int
    The number of times a data model has been returned.
model_history : List[DataModel]
    A list of all valid models generated.
current_model : DataModel
    The most recently generated or loaded data model.

Class Methods

init

Takes an LLM instance and Discovery information. Either a Discovery object can be provided, or each field can be provided individually.

Parameters
----------
llm : BaseLLM
    The LLM used to generate data models.
discovery : Union[str, Discovery], optional
    Either a string containing the LLM generated
    discovery or a Discovery object that has been run.
    If a Discovery object is provided then the remaining
    discovery attributes don't need to be provided, by
    default ""
user_input : Union[Dict[str, str], UserInput], optional
    Either a dictionary with keys general_description
    and column names with descriptions or a UserInput
    object, by default dict()
data_dictionary : Dict[str, Any], optional
    A data dictionary. If single-file input, then the
    keys will be column names and the values are
    descriptions.
    If multi-file input, the keys are file names and
    each contain a nested dictionary of column name keys
    and description values.
    This argument will take precedence over any data
    dictionary provided via the Discovery object.
    This argument will take precedence over the
    allowed_columns argument. By default None
allowed_columns : List[str], optional
    A list of allowed columns for modeling. Can be used
    only for single-file inputs. By default = list()

create_initial_model

Generate the initial model. You may access this model with the get_model method and providing version=1.

Parameters
----------
max_retries : int, optional
    The max number of retries for generating the initial
    model, by default 3
use_advanced_data_model_generation_rules, optional
    Whether to include advanced data modeling rules, by
    default True
allow_duplicate_properties : bool, optional
    Whether to allow a property to exist on multiple
    node labels or relationship types, by default False
enforce_uniqueness : bool, optional
    Whether to error if a node has no unique identifiers
    (unique or node key).
    Setting this to false may be detrimental during code
    generation and ingestion. By default True
allow_parallel_relationships : bool, optional
    Whether to allow parallel relationships to exist in
    the data model, by default False
apply_neo4j_naming_conventions : bool, optional
    Whether to apply Neo4j naming conventions to the
    generated Data Model, by default True

Returns
-------
DataModel
    The generated data model.

get_model

Get the data model version specified. By default will return the most recent model. Version are 1-indexed.

Parameters
----------
version : int, optional
    The model version, 1-indexed, by default -1
as_dict : bool, optional
    whether to return as a Python dictionary. Will
    otherwise return a DataModel object, by default
    False

Returns
-------
Union[DataModel, Dict[str, Any]]
    The data model in desired format.

Examples
--------
>>> gdm.get_model(1) == gdm.model_history[0]
True

iterate_model

Iterate on the current model. A data model must exist in the model_history property to run.

Parameters
----------
iterations : int, optional
    How many times to perform model generation. Each
    successful iteration will be appended to the
    GraphDataModeler model_history.
    For example if a value of 2 is provided, then two
    successful models will be appended to the
    model_history. Model generation will use the same
    prompt for each generation attempt. By default 1
corrections : Union[str, None], optional
    What changes the user would like the LLM to address
    in the next model, by default None
max_retries : int, optional
    The max number of retries for generating the initial
    model, by default 3
use_advanced_data_model_generation_rules, optional
    Whether to include advanced data modeling rules, by
    default True
allow_duplicate_properties : bool, optional
    Whether to allow a property to exist on multiple
    node labels or relationship types, by default False
enforce_uniqueness : bool, optional
    Whether to error if a node has no unique identifiers
    (unique or node key).
    Setting this to false may be detrimental during code
    generation and ingestion. By default True
allow_parallel_relationships : bool, optional
    Whether to allow parallel relationships to exist in
    the data model, by default False
apply_neo4j_naming_conventions : bool, optional
    Whether to apply Neo4j naming conventions to the
    generated Data Model, by default True

Returns
-------
DataModel
    The most recently generated data model.

load_model

Append a new data model to the end of the model_history. This will become the new current_model.

Parameters
----------
data_model : DataModel
    The new data model.

Raises
------
ValueError
    If the data_model is not an instance of DataModel.

Class Properties

allowed_columns

The allowed columns for model generation. If multi-file, then a dictionary with file name keys and list of columns for values. If single-file, then a list of columns.

Returns
-------
Dict[str, List[str]]
    The allowd columns for data model generation.

Raises
------
AssertionError
    When no _data_dictionary attribute is initialized in
    the GraphDataModeler class.

current_model

Get the most recently created or loaded data model.

Returns
-------
DataModel
    The current data model.

current_model_viz

Visualize the most recent model with Graphviz.

Returns
-------
Digraph
    The object to visualize.

is_multifile

Whether data is multi-file or not.

Returns
-------
bool
    True if multi-file detected, else False