PyIngestConfigGenerator
from neo4j_runway.code_generation import PyIngestConfigGenerator
Class responsible for generating the PyIngest config yaml.
Attributes
----------
data_model : DataModel
The data model to base ingestion code on.
file_directory : str, optional
Where the files are located.
file_output_directory : str, optional
The location that generated files should be saved
to.
source_name : str, optional
The name of the data file. If more than one file is
used, this arg should not be provided.
File names should be included within the data model.
strict_typing : bool, optional
Whether to use the types declared in the data model
(True), or infer types during ingestion (False).
username : Union[str, None], optional
The Neo4j username. Providing credentials here will
write them into the configuration. Use with caution!
password : Union[str, None], optional
The Neo4j password. Providing credentials here will
write them into the configuration. Use with caution!
uri : Union[str, None], optional
The Neo4j uri. Providing credentials here will write
them into the configuration. Use with caution!
database : Union[str, None], optional
The Neo4j database. Providing credentials here will
write them into the configuration. Use with caution!
global_batch_size : int, optional
The global batch size to use. Will be overwritten by
any batch sizes declared in the pyingest_file_config
arg.
global_field_separator : Optional[str], optional
The global field separator to use. Will be
overwritten by any batch sizes declared in the
pyingest_file_config arg.
pyingest_file_config : Optional[Dict[str, Any]],
optional
Additional configuration parameters to inject into
the final YAML configuration. Parameters are file
specific.
Supported parameters are: batch_size <int>,
skip_records <int>, skip_file <int> and
field_separator <str>.
pre_ingest_code : Union[str, List[str], None], optional
Code to be run before data is ingested. This should
include any constraints or indexes that will not be
auto-generated by Runway.
post_ingest_code : Union[str, List[str], None], optional
Code to be run after all data is ingested.
Class Methods
init
Class responsible for generating the PyIngest config yaml. Output is compatible with Runway ingest as well as the original PyIngest.
Parameters
----------
data_model : DataModel
The data model to base ingestion code on.
file_directory : str, optional
Where the files are located. By default = "./"
file_output_directory : str, optional
The location that generated files should be saved
to, by default "./"
source_name : str, optional
The name of the CSV file. If more than one CSV is
used, this arg should not be provided.
CSV file names should be included within the data
model. By default = ""
strict_typing : bool, optional
Whether to use the types declared in the data model
(True), or infer types during ingestion (False). By
default True
username : Union[str, None], optional
The Neo4j username. Providing credentials here will
write them into the configuration. Use with caution!
By default None
password : Union[str, None], optional
The Neo4j password. Providing credentials here will
write them into the configuration. Use with caution!
By default None
uri : Union[str, None], optional
The Neo4j uri. Providing credentials here will write
them into the configuration. Use with caution! By
default None
database : Union[str, None], optional
The Neo4j database. Providing credentials here will
write them into the configuration. Use with caution!
By default None
global_batch_size : int, optional
The global batch size to use. Will be overwritten by
any batch sizes declared in the pyingest_file_config
arg. By default 100
global_field_separator : Optional[str], optional
The global field separator to use. Will be
overwritten by any batch sizes declared in the
pyingest_file_config arg. By default None
pyingest_file_config : Optional[Dict[str, Any]],
optional
Additional configuration parameters to inject into
the final YAML configuration. Parameters are file
specific.
Supported parameters are: batch_size <int>,
skip_records <int>, skip_file <int> and
field_separator <str>. By default dict()
Example: pyingest_config = {
"A.csv": {"field_separator": "|", "skip_file":
False, "skip_records": 5},
"B.csv": {"skip_file": True, "batch_size": 1234},
}
pre_ingest_code : Union[str, List[str], None], optional
Code to be run before data is ingested. This should
include any constraints or indexes that will not be
auto-generated by Runway. By default = None
post_ingest_code : Union[str, List[str], None], optional
Code to be run after all data is ingested. By
default = None
generate_config_string
Generate the PyIngest yaml in string format.
Returns
-------
str
The yaml configuration in String format.
generate_config_yaml
Generate the PyIngest YAML config file.
Parameters
----------
file_name : str, optional
Name of the file, by default "pyingest_config"
generate_constraints_file
Genreate a .cypher file containing the generated constraints.
Parameters
----------
file_name : str, optional
Name of the file, by default "constraints.cypher"
generate_constraints_string
Generate a single String representation of all constraints.
Returns
-------
str
The constraints in String format.
generate_cypher_file
Generate a .cypher file containing the generated ingestion code.
Parameters
----------
file_name : str, optional
Name of the file, by default "ingest_code.cypher"
generate_cypher_string
Generate a single String representation of all ingestion code.
Returns
-------
str
The Cypher in String format.