Pipelines¶

Pipelines are the engines that make Superpipe run. A pipeline is a series of steps chained together that acts on a dataframe. A pipeline takes an optional evaluation function that can run arbitrary Python code. Evaluation functions need to return booleans.

Pipeline statistics¶

A pipeline object has associated pipeline statistics.

Stat	Description
score	Accuracy score of the pipeline as defined by the evaluation function.
input_tokens	Total number of input tokens used by the pipeline split out by model.
output_tokens	Total number of output tokens used by the pipeline split out by model.
input_cost	Total input cost of the pipeline split out by model.
output_cost	Total output cost of the pipeline split out by model.
num_success	Number of successful rows.
num_failure	Number of unsuccessful rows.
total_latency	Total latency of the pipeline.

Pipeline methods¶

update_param()¶

pipeline.update_params() takes a parameters dictionary of steps and parameters. For example, to update the categorize pipeline to use GPT-4, we can call update_param and pass in the step name as the key, with a sub dictionary with model as the key.

categorizer.update_params({
  "categorize": {
    "model": models.gpt4
  }
})

Example¶

You can find the full code for this example in the comparing pipelines example. This is just the pipeline definition.

evaluate = lambda row: row['predicted_category'].lower() == row['category_new'].lower()

categorizer = pipeline.Pipeline([
  short_description_step,
  embedding_search_step,
  categorize_step,
  select_category_step
], evaluation_fn=evaluate)

categorizer.run(test_df)