Artifician
Github Repository
  • Introduction
    • “Turn your data preparation nightmares into a dream.”
    • Why Artifician?
    • Simple Example
    • Without Artifician
    • Using Artifician
    • Output
  • Getting Started with Artifician
    • Pre-requisites
    • Installation
    • Using pip
    • Using conda
    • Verify Installation
    • Next Steps
  • Quick Start
    • Define Extractor
    • Initialize components
    • Subscriptions
    • Dataset Preparation
    • Output
  • Advanced Concepts
    • Processor Chaining
      • Overview
      • Key Features
      • Syntax Showcase
      • Example Scenario: NLP Processing Pipeline
      • Building an NLP Pipeline with processor chaining
      • Output
    • Defining Custom Extractors
      • Introduction
      • Why Custom Extractors?
      • How Extractors Work
      • Example of a Simple Extractor
      • Integrating Custom Extractors
      • Advanced Usage
      • Conclusion
    • Defining Custom Processors
      • Introduction
      • Why Custom Processors?
      • How Processors Work
      • Example of a Simple Processor
      • Integrating Custom Processors
      • Advanced Usage
      • Conclusion
    • Library Architecture
      • Events
      • Dataset
      • Feature Definition
      • Processors
      • Extractors
  • API Reference
Powered by GitBook
On this page
  • Events
  • Dataset
  • Feature Definition
  • Processors
  • Extractors

Was this helpful?

Edit on GitHub
  1. Advanced Concepts

Library Architecture

PreviousDefining Custom ProcessorsNextAPI Reference

Last updated 1 year ago

Was this helpful?

This section offers a deep dive into the core components that constitute the Artifician library, explaining their roles, functionalities, and how they interact with each other.

Events

Events serve as the backbone of Artifician’s event-driven architecture. They provide the hooks that allow different components to communicate in a decoupled manner. In this system, the entity that triggers an event is known as the “publisher,” and the entity that listens to the event is the “observer.” Essentially, events are Python functions that act as communication channels between different parts of your application.

Dataset

Feature Definition

Feature Definitions are the workhorses of the Artifician library. They are responsible for extracting specific features from raw data using an “extractor” function. This extractor is a custom function that you define to pick out the data you are interested in. Feature Definitions act as both publishers and observers: they can emit their events and also listen to events emitted by other publishers, including the Dataset.

Processors

Processors play a crucial role in shaping the data into a usable format. They can perform a wide range of operations, from normalization to more complex feature engineering tasks. Processors listen to events triggered by publishers like Feature Definitions and act upon the data accordingly. They are extendable, allowing you to define custom processors that fit your specific needs.

Extractors

Extractors are user-defined functions that you pass into the Feature Definition component. Their role is to extract specific features from a given sample of raw data. Think of them as the first line of data transformation, essentially turning unstructured or semi-structured data into something that can be processed further down the pipeline. They serve as the bridge between the raw data and the Feature Definitions, allowing you to tailor the data extraction process to your specific needs.

The Dataset is more than just a storage unit; it serves as the core structure that holds and manages the prepared data. It emits events that can be observed by other components like Feature Definitions and Processors. Internally, the Dataset utilizes a object to hold the prepared data, offering a familiar interface for data manipulation and analysis.

pandas.DataFrame
image