> For the complete documentation index, see [llms.txt](https://plato-solutions.gitbook.io/artifician/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://plato-solutions.gitbook.io/artifician/advanced_concepts/processor_chaining.md).

# Processor Chaining

## Overview

Discover Processor Chaining – a fundamental concept in the Artifician library that revolutionizes the way you handle data processing workflows. Whether it's complex data transformations, in-depth text analysis, or sequential data manipulations, Processor Chaining is the conceptual backbone that ensures smooth and efficient operations. It's a versatile approach, applicable not only to NLP tasks but to a wide array of sequential data operations.

## Key Features

* **Sequential Execution**: Processor Chaining provides a structured way to pass data through a series of processors, maintaining a logical and efficient flow.
* **Flexible Pipeline Construction**: Adapt your data processing pipeline to your project's evolving requirements by adding, removing, or rearranging processors.
* **Broad Applicability**: This concept is compatible with a diverse range of processors, making it an ideal choice for managing complex data workflows.

## Syntax Showcase

Processor chaining can be achieved in two different ways:

* **Chain Processors with `.then()`**:

  ```python
   pipeline = TextCleaningProcessor().then(TokenizationProcessor()).then(StemLemProcessor())
  ```

This approach lets you link processors together in a logical sequence, ideal for step-by-step construction of a pipeline.

* **Chain Processors with `.then()`, Using Lists**:

  ```python
  pipeline = TextCleaningProcessor().then([TokenizationProcessor(), StemLemProcessor()])
  ```

This method enhances flexibility, enabling you to add multiple processors at once using a list - perfect for incorporating groups of processors efficiently.

## Example Scenario: NLP Processing Pipeline

Imagine you're preparing textual data for NLP analysis. The raw data requires cleaning, tokenizing, stemming or lemmatizing, and stop word removal. Processor Chaining simplifies this daunting task, allowing seamless integration of these steps into an effective pipeline.

Now, let's see how you can effortlessly achieve this with processor chaining.

## Building an NLP Pipeline with PCM

Here's how Processor Chaining can be used to create a comprehensive NLP processing pipeline:

```python
from artifician import Dataset, FeatureDefinition
from artifician.processors import *

# Initialize individual processors
cleaner = TextCleaningProcessor()
tokenizer = TokenizationProcessor(method='word')
stemlem = StemLemProcessor()
stopword = StopWordsRemoverProcessor()

# Define a extractor to retrieve text
def get_text(text): return text

# Sample texts for processing
sample_texts = [
    "The quick brown fox jumps over the lazy dog!",
    "<p>HTML tags should be removed.</p>",
]

# Setup Dataset and FeatureDefinition
dataset = Dataset()
feature = FeatureDefinition(get_text, [dataset])

# Create and configure the processing pipeline
processing_pipeline = cleaner.then(tokenizer).then(stemlem).then(stopword)
processing_pipeline.subscribe(feature)

# Process the sample texts
datastore = dataset.add_samples(sample_texts)
```

### Output

After running our NLP pipeline with processor chaining, we successfully process the text samples, which results in a neatly organized DataFrame. The DataFrame showcases a side-by-side comparison of the original texts and their processed counterparts, illustrating the effectiveness of our processing pipeline:

| Original Text                                | Processed Text                                    |
| -------------------------------------------- | ------------------------------------------------- |
| The quick brown fox jumps over the lazy dog! | \['quick', 'brown', 'fox', 'jump', 'lazy', 'dog'] |
| The HTML tags should be removed.             | \['html', 'tag', 'remove']                        |


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://plato-solutions.gitbook.io/artifician/advanced_concepts/processor_chaining.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
