# API Reference

## Table of Contents

* [artifician](#artifician)
* [artifician.feature\_definition](#artifician.feature_definition)
  * [FeatureDefinition](#artifician.feature_definition.FeatureDefinition)
    * [\_\_init\_\_](#artifician.feature_definition.FeatureDefinition.__init__)
    * [process](#artifician.feature_definition.FeatureDefinition.process)
    * [map](#artifician.feature_definition.FeatureDefinition.map)
    * [observe](#artifician.feature_definition.FeatureDefinition.observe)
    * [subscribe](#artifician.feature_definition.FeatureDefinition.subscribe)
* [artifician.dataset](#artifician.dataset)
  * [Dataset](#artifician.dataset.Dataset)
    * [add\_samples](#artifician.dataset.Dataset.add_samples)
    * [observe](#artifician.dataset.Dataset.observe)
    * [post\_process](#artifician.dataset.Dataset.post_process)
* [artifician.processors.chain](#artifician.processors.chain)
  * [chain](#artifician.processors.chain.chain)
    * [\_\_init\_\_](#artifician.processors.chain.chain.__init__)
    * [then](#artifician.processors.chain.chain.then)
    * [process](#artifician.processors.chain.chain.process)
    * [subscribe](#artifician.processors.chain.chain.subscribe)
* [artifician.processors.mapper](#artifician.processors.mapper)
  * [Mapper](#artifician.processors.mapper.Mapper)
    * [\_\_init\_\_](#artifician.processors.mapper.Mapper.__init__)
    * [process](#artifician.processors.mapper.Mapper.process)
    * [subscribe](#artifician.processors.mapper.Mapper.subscribe)
  * [FeatureMap](#artifician.processors.mapper.FeatureMap)
    * [get\_value\_id](#artifician.processors.mapper.FeatureMap.get_value_id)
* [artifician.processors](#artifician.processors)
* [artifician.processors.processor](#artifician.processors.processor)
  * [Processor](#artifician.processors.processor.Processor)
    * [process](#artifician.processors.processor.Processor.process)
    * [subscribe](#artifician.processors.processor.Processor.subscribe)
    * [then](#artifician.processors.processor.Processor.then)
* [artifician.processors.text](#artifician.processors.text)
* [artifician.processors.text.text\_cleaner](#artifician.processors.text.text_cleaner)
  * [TextCleaningProcessor](#artifician.processors.text.text_cleaner.TextCleaningProcessor)
    * [\_\_init\_\_](#artifician.processors.text.text_cleaner.TextCleaningProcessor.__init__)
    * [process](#artifician.processors.text.text_cleaner.TextCleaningProcessor.process)
    * [subscribe](#artifician.processors.text.text_cleaner.TextCleaningProcessor.subscribe)
* [artifician.processors.text.stop\_word\_remover](#artifician.processors.text.stop_word_remover)
  * [StopWordsRemoverProcessor](#artifician.processors.text.stop_word_remover.StopWordsRemoverProcessor)
    * [\_\_init\_\_](#artifician.processors.text.stop_word_remover.StopWordsRemoverProcessor.__init__)
    * [process](#artifician.processors.text.stop_word_remover.StopWordsRemoverProcessor.process)
    * [subscribe](#artifician.processors.text.stop_word_remover.StopWordsRemoverProcessor.subscribe)
* [artifician.processors.text.tokenizer](#artifician.processors.text.tokenizer)
  * [TokenizationProcessor](#artifician.processors.text.tokenizer.TokenizationProcessor)
    * [\_\_init\_\_](#artifician.processors.text.tokenizer.TokenizationProcessor.__init__)
    * [process](#artifician.processors.text.tokenizer.TokenizationProcessor.process)
    * [subscribe](#artifician.processors.text.tokenizer.TokenizationProcessor.subscribe)
* [artifician.processors.text.stemlemtizer](#artifician.processors.text.stemlemtizer)
  * [StemLemProcessor](#artifician.processors.text.stemlemtizer.StemLemProcessor)
    * [\_\_init\_\_](#artifician.processors.text.stemlemtizer.StemLemProcessor.__init__)
    * [process](#artifician.processors.text.stemlemtizer.StemLemProcessor.process)
    * [subscribe](#artifician.processors.text.stemlemtizer.StemLemProcessor.subscribe)
* [artifician.processors.normalizer](#artifician.processors.normalizer)
  * [Normalizer](#artifician.processors.normalizer.Normalizer)
    * [\_\_init\_\_](#artifician.processors.normalizer.Normalizer.__init__)
    * [process](#artifician.processors.normalizer.Normalizer.process)
    * [subscribe](#artifician.processors.normalizer.Normalizer.subscribe)
  * [NormalizerStrategy](#artifician.processors.normalizer.NormalizerStrategy)
  * [PropertiesNormalizer](#artifician.processors.normalizer.PropertiesNormalizer)
    * [normalize](#artifician.processors.normalizer.PropertiesNormalizer.normalize)
  * [PathsNormalizer](#artifician.processors.normalizer.PathsNormalizer)
    * [get\_path\_values](#artifician.processors.normalizer.PathsNormalizer.get_path_values)
    * [normalize](#artifician.processors.normalizer.PathsNormalizer.normalize)
  * [KeyValuesNormalizer](#artifician.processors.normalizer.KeyValuesNormalizer)
    * [normalize\_key\_values](#artifician.processors.normalizer.KeyValuesNormalizer.normalize_key_values)
    * [normalize](#artifician.processors.normalizer.KeyValuesNormalizer.normalize)
  * [StrategySelector](#artifician.processors.normalizer.StrategySelector)
    * [get\_paths\_delimiter](#artifician.processors.normalizer.StrategySelector.get_paths_delimiter)
    * [get\_key\_values\_delimiter](#artifician.processors.normalizer.StrategySelector.get_key_values_delimiter)
    * [get\_properties\_delimiter](#artifician.processors.normalizer.StrategySelector.get_properties_delimiter)
    * [select](#artifician.processors.normalizer.StrategySelector.select)
* [artifician.extractors.text\_extractors.keyword\_extractor](#artifician.extractors.text_extractors.keyword_extractor)
  * [KeywordExtractor](#artifician.extractors.text_extractors.keyword_extractor.KeywordExtractor)
    * [\_\_init\_\_](#artifician.extractors.text_extractors.keyword_extractor.KeywordExtractor.__init__)
* [artifician.extractors.html\_extractors](#artifician.extractors.html_extractors)
  * [get\_node\_text](#artifician.extractors.html_extractors.get_node_text)
  * [get\_node\_attribute](#artifician.extractors.html_extractors.get_node_attribute)
  * [get\_parent\_node\_text](#artifician.extractors.html_extractors.get_parent_node_text)
  * [get\_child\_node\_text](#artifician.extractors.html_extractors.get_child_node_text)
  * [count\_child\_nodes](#artifician.extractors.html_extractors.count_child_nodes)
  * [get\_sibling\_node\_text](#artifician.extractors.html_extractors.get_sibling_node_text)
  * [get\_parent\_attribute](#artifician.extractors.html_extractors.get_parent_attribute)
  * [get\_child\_attribute](#artifician.extractors.html_extractors.get_child_attribute)
* [artifician.extractors](#artifician.extractors)

## artifician

## artifician.feature\_definition

### FeatureDefinition Objects

```python
class FeatureDefinition()
```

Contains all the functionality for preparing a single feature.

**Attributes**:

* `value` *Any* - The value of the feature.
* `cached` *dict* - Cached observables for different events.
* `extractor` *Callable* - Function to extract feature value from the artifician.
* `EVENT_PROCESSED` *Callable* - Event that processes the feature.
* `MAP_VALUES` *Callable* - Event that maps values of the feature.
* `extractor_parameters` *Tuple* - Parameters for the extractor function.

**\_\_init\_\_**

```python
def __init__(extractor: Callable = lambda sample: sample,
             subscribe_to: List = None,
             *extractor_parameters)
```

Initializes a FeatureDefinition instance.

**Arguments**:

* `extractor` *Callable, optional* - Function to extract feature value.
* `subscribe_to` *List* - List of publishers to subscribe to.
* `extractor_parameters` - Additional parameters for the extractor.

**Raises**:

* `ValueError` - If no publishers are provided to subscribe to.

**process**

```python
def process(publisher, sample: Any) -> None
```

Processes the sample to build the feature value.

**Arguments**:

* `sample` *Any* - The sample data.
* `publisher` - The instance of the publisher.

**map**

```python
def map(feature_value: Any) -> None
```

Maps the feature value into an int or list of ints.

**Arguments**:

* `feature_value` *Any* - The feature value to be mapped.

**observe**

```python
def observe(event: Callable) -> Subject
```

Builds and returns an observable for a given event.

**Arguments**:

* `event` *Callable* - The function to create an observable from.

**Returns**:

* `Observable` - An observable for the given event.

**subscribe**

```python
def subscribe(publisher, pool_scheduler=None) -> None
```

Defines logic for subscribing to an event in a publisher.

**Arguments**:

* `publisher` - The publisher instance.
* `pool_scheduler` *optional* - The scheduler instance for concurrency.

## artifician.dataset

### Dataset Objects

```python
class Dataset()
```

Dataset contains all the functionality for preparing Artifician data. It observes events and stores all processed data in a Pandas DataFrame.

**Attributes**:

* `cached` *dict* - Cached observables for different events.
* `datastore` *pd.DataFrame* - DataFrame to store all samples.
* `PREPARE_DATASET` *Callable* - Event to prepare the dataset.
* `POST_PROCESS` *Callable* - Event for post-processing actions on the dataset.

**add\_samples**

```python
def add_samples(samples: Any) -> pd.DataFrame
```

Adds samples to the datastore.

**Arguments**:

* `samples` *Any* - Artifician data to be added.

**Returns**:

* `pd.DataFrame` - The updated dataset.

**Raises**:

* `TypeError` - If the input data is not a list.

**observe**

```python
def observe(event)
```

Builds and returns an observable for a given event.

**Arguments**:

* `event` *Callable* - Function to create an observable from.

**Returns**:

* `rx.subject.Subject` - Observable for the given event.

**post\_process**

```python
def post_process()
```

This event should be called after Artifician data is prepared. Listeners to the post\_process event can perform collective actions on the dataset.

## artifician.processors.chain

### chain Objects

```python
class chain()
```

Manages a chain of processors.

This class handles the sequential execution of a chain of processors and can subscribe to a publisher to trigger the processing.

**Attributes**:

* `processors` *list* - A list of processors in the chain.

**\_\_init\_\_**

```python
def __init__(processors=None) -> None
```

Initializes the chain with an optional list of processors.

**Arguments**:

* `processors` *list, optional* - An initial list of processors to be managed.

**then**

```python
def then(next_processor) -> 'chain'
```

Adds a processor to the end of the chain.

**Arguments**:

* `processor` *Processor* - The processor to add to the chain.

**Returns**:

* `processor_chaining` *chain* - The chain instance.

**process**

```python
def process(publisher, data: any) -> any
```

Processes data sequentially through the chain of processors.

**Arguments**:

* `data` - The data to be processed by the chain.

**Returns**:

The final processed data after passing through all processors.

**subscribe**

```python
def subscribe(publisher, pool_scheduler=None) -> None
```

Subscribes the processor chain to a feature definition.

The feature definition will trigger the processing of the chain.

**Arguments**:

* `feature_definition` *publisher* - The feature definition to subscribe to.

## artifician.processors.mapper

### Mapper Objects

```python
class Mapper(processor.Processor)
```

Mapper is a processor responsible for mapping/converting feature values to int

**Attributes**:

* `feature_map` *FeatureMap* - Feature map contains dictionary --> {value: id}
* `map_key_values` *bool* - True ---> Map both key and value, False ---> map only keys

**\_\_init\_\_**

```python
def __init__(feature_map, subscribe_to=None, map_key_values=False)
```

initialise Mapper by setting up the feature map

**Arguments**:

* `feature_map` *FeatureMap* - instance of feature\_map
* `map_key_values` *Boolean* - True = map both the key and values, False = map only values

**process**

```python
def process(publisher, feature_value)
```

update the feature value of the publisher by mapping features value to int

**Arguments**:

* `publisher` *object* - instance of the publisher
* `feature_value(string)` - feature\_value

**Returns**:

value\_id =

**subscribe**

```python
def subscribe(publisher, pool_scheduler=None)
```

Defines logic for subscribing to an event in publisher

**Arguments**:

* `publisher` *object* - instance of publisher
* `pool_scheduler` *rx.scheduler.ThreadPoolScheduler* - scheduler instance for concurrency

**Returns**:

None

### FeatureMap Objects

```python
class FeatureMap()
```

Converts given value to int

**Attributes**:

* `values_map` *dictionary* - {value : id}

**get\_value\_id**

```python
def get_value_id(value)
```

returns the id of the value in values. convert any datatype to str as dictionary keys can not be of other than str and int. each format can be converted to str only.

**Arguments**:

* `value` *any* - value

**Returns**:

* `value_id` *int* - ID of the value

## artifician.processors

## artifician.processors.processor

### Processor Objects

```python
class Processor(ABC)
```

Interface for processors in the Artifician library, updated for processor chaining.

This abstract class defines the interface for processors, including methods for processing data and subscribing to publishers, along with the ability to chain processors.

**process**

```python
@abstractmethod
def process(publisher, *data)
```

Process the data and update the publisher with the processed values.

**Arguments**:

* `publisher` - The publisher to which the processed data will be updated.
* `data` - The data to be processed.

**subscribe**

```python
@abstractmethod
def subscribe(publisher, pool_scheduler=None)
```

Subscribe the processor to a publisher (e.g., FeatureDefinition).

**Arguments**:

* `publisher` - The publisher to subscribe to.
* `pool_scheduler` *optional* - The scheduler to be used for subscription.

**then**

```python
def then(next_processor)
```

Link this processor to the next one in the chain.

**Arguments**:

* `next_processor` - The next processor to add to the chain.

**Returns**:

* `chain` - chain of processors

**Raises**:

* `TypeError` - If the next\_processor is not a valid processor instance.

## artifician.processors.text

## artifician.processors.text.text\_cleaner

### TextCleaningProcessor Objects

```python
class TextCleaningProcessor(Processor)
```

Processor for cleaning and preprocessing text data.

Configurable attributes for various cleaning operations.

**\_\_init\_\_**

```python
def __init__(lowercase=True,
             remove_punctuation=True,
             remove_numbers=True,
             strip_whitespace=True,
             remove_html_tags=True,
             remove_urls=True,
             subscribe_to=None)
```

Initialize a TextCleaningProcessor object.

**Arguments**:

* `lowercase` *bool* - Flag to convert text to lowercase.
* `remove_punctuation` *bool* - Flag to remove punctuation.
* `remove_numbers` *bool* - Flag to remove numbers.
* `strip_whitespace` *bool* - Flag to strip extra whitespaces.
* `remove_html_tags` *bool* - Flag to remove HTML tags.
* `remove_urls` *bool* - Flag to remove URLs.
* `custom_stop_words` *List\[str]* - Optional list of custom stop words.
* `subscribe_to` *list* - Optional list of publishers to subscribe to.

**process**

```python
def process(publisher, text: Union[str, List[str]]) -> Union[str, List[str]]
```

Process the text or list of texts to clean and preprocess.

**Arguments**:

* `publisher` - The publisher associated with the processor.
* `text` *Union\[str, List\[str]]* - The text or list of texts to be processed.

**Returns**:

Union\[str, List\[str]]: Cleaned and preprocessed text.

**subscribe**

```python
def subscribe(publisher, pool_scheduler=None)
```

Subscribe to a publisher for event-driven processing.

**Arguments**:

* `publisher` *object* - The publisher to subscribe to.
* `pool_scheduler` *optional* - Scheduler instance for concurrency.

**Returns**:

None

## artifician.processors.text.stop\_word\_remover

### StopWordsRemoverProcessor Objects

```python
class StopWordsRemoverProcessor(Processor)
```

Processor for removing stop words from text data.

**Attributes**:

* `stop_words` *set* - A set of stop words to be removed.

**\_\_init\_\_**

```python
def __init__(custom_stop_words: List[str] = None, subscribe_to=None)
```

Initialize a StopWordsRemoverProcessor object.

**Arguments**:

* `custom_stop_words` *List\[str]* - Optional list of custom stop words.
* `subscribe_to` *list* - Optional list of publishers to subscribe to.

**process**

```python
def process(publisher, text: Union[str, List[str]]) -> Union[str, List[str]]
```

Process the text or list of texts to remove stop words.

**Arguments**:

* `publisher` - The publisher associated with the processor.
* `text` *Union\[str, List\[str]]* - The text or list of texts to be processed.

**Returns**:

Union\[str, List\[str]]: Text after stop words removal.

**Raises**:

* `ValueError` - If the input text is None or an empty list.

**subscribe**

```python
def subscribe(publisher, pool_scheduler=None)
```

Subscribe to a publisher for event-driven processing.

**Arguments**:

* `publisher` *object* - The publisher to subscribe to.
* `pool_scheduler` *optional* - Scheduler instance for concurrency.

**Returns**:

None

## artifician.processors.text.tokenizer

### TokenizationProcessor Objects

```python
class TokenizationProcessor(Processor)
```

Tokenization Processor for splitting text into tokens.

**Attributes**:

* `method` *str* - Method to use for tokenization ('word' or 'sentence').
* `nlp` - spaCy language model for processing text.

**\_\_init\_\_**

```python
def __init__(method: str = 'word', subscribe_to=None)
```

Initialize a TokenizationProcessor object.

**Arguments**:

* `method` *str* - Method to use for tokenization ('word' or 'sentence').

**process**

```python
def process(
        publisher, text: Union[str, List[str],
                               None]) -> Union[List[str], List[List[str]]]
```

Process the text or list of texts and split it into tokens.

**Arguments**:

* `text` *Union\[str, List\[str], None]* - The text or list of texts to be tokenized.

**Returns**:

Union\[List\[str], List\[List\[str]]]: A list of tokens or list of lists of tokens.

**Raises**:

* `ValueError` - If the input text is None or an empty list.

**subscribe**

```python
def subscribe(publisher, pool_scheduler=None)
```

Defines logic for subscribing to an event in publisher

**Arguments**:

* `publisher` *object* - instance of the publisher
* `pool_scheduler` *rx.scheduler.ThreadPoolScheduler* - scheduler instance for concurrency

**Returns**:

None

## artifician.processors.text.stemlemtizer

### StemLemProcessor Objects

```python
class StemLemProcessor(Processor)
```

Processor for applying stemming and lemmatization to text data.

**Attributes**:

* `mode` *str* - Mode of operation ('stemming' or 'lemmatization').
* `nlp` - spaCy language model for lemmatization.
* `stemmer` - NLTK stemmer for stemming.

**\_\_init\_\_**

```python
def __init__(mode: str = 'lemmatization', subscribe_to=None)
```

Initialize a StemLemProcessor object.

**Arguments**:

* `mode` *str* - Operation mode ('stemming' or 'lemmatization').
* `subscribe_to` *list* - Optional list of publishers to subscribe to.

**process**

```python
def process(publisher, text: Union[str, List[str]]) -> Union[str, List[str]]
```

Process the text or list of tokens for stemming or lemmatization.

**Arguments**:

* `publisher` - The publisher associated with the processor.
* `text` *Union\[str, List\[str]]* - The text or list of tokens to be processed.

**Returns**:

Union\[str, List\[str]]: Processed text or list of processed tokens.

**subscribe**

```python
def subscribe(publisher, pool_scheduler=None)
```

Subscribe to a publisher for event-driven processing.

**Arguments**:

* `publisher` *object* - The publisher to subscribe to.
* `pool_scheduler` *optional* - Scheduler instance for concurrency.

**Returns**:

None

## artifician.processors.normalizer

### Normalizer Objects

```python
class Normalizer(processor.Processor)
```

Normalize the given string value

**Attributes**:

* `strategy` *NormalizerStrategy* - strategy for normalizing string
* `delimiter` *dictionary* - delimiter for splitting the string

**\_\_init\_\_**

```python
def __init__(strategy=None, subscribe_to=None, delimiter=None)
```

Initialize the Normalizer by setting up the normalizer strategy and the delimiter

**Arguments**:

* `strategy` *NormalizerStrategy* - NormalizerStrategy instance which normalizes string
* `delimiter` *dictionary* - delimiter for splitting the string

**process**

```python
def process(publisher, feature_raw)
```

Normalize the feature\_raw value Note : publisher.feature\_value is updated instead of returning the value as normalizer being a processor

**Arguments**:

* `publisher` *object* - instance of the publisher
* `feature_raw` *string* - feature value

**Returns**:

None

**subscribe**

```python
def subscribe(publisher, pool_scheduler=None)
```

Defines logic for subscribing to an event in publisher

**Arguments**:

* `publisher` *object* - instance of the publisher
* `pool_scheduler` *rx.scheduler.ThreadPoolScheduler* - scheduler instance for concurrency

**Returns**:

None

### NormalizerStrategy Objects

```python
class NormalizerStrategy(ABC)
```

interface for normalizer strategies

### PropertiesNormalizer Objects

```python
class PropertiesNormalizer(NormalizerStrategy)
```

Split by delimiter into a format that preserves the sequential position of each value found.

**normalize**

```python
def normalize(feature_raw, delimiter)
```

split by delimiter into format that preserves sequential position of each value in feature text found

**Arguments**:

* `delimiter` - delimiter is used for breaking string
* `feature_raw` *string* - feature\_raw

**Returns**:

* `feature_normalized` *list* - list of tuple of normalized feature raw

### PathsNormalizer Objects

```python
class PathsNormalizer(NormalizerStrategy)
```

split by delimiter into a format that preserves position within tree of each value found

**get\_path\_values**

```python
@staticmethod
def get_path_values(feature_raw_values, delimiter)
```

gets path values sequentially

**Arguments**:

* `feature_raw_values` *list* - list of strings
* `delimiter` *string* - delimiter is used for breaking string

**Returns**:

* `feature_normalized` *list* - list of tuple of normalized feature text values

**normalize**

```python
def normalize(feature_raw, delimiter)
```

split by delimiter into a format that preserves position within tree of each value found

**Arguments**:

* `feature_raw` *string* - feature text
* `delimiter` *dict* - delimiter is used for breaking string

**Returns**:

* `feature_normalized` *list* - list of tuple of normalized feature text values

### KeyValuesNormalizer Objects

```python
class KeyValuesNormalizer(NormalizerStrategy)
```

split by delimiter into a format that preserves value and label association found.

**normalize\_key\_values**

```python
@staticmethod
def normalize_key_values(key_values, assignment)
```

break down text using assignment into key value pair

**Arguments**:

* `key_values` *list* - list of strings
* `assignment` *string* - string that separates key and values

**Returns**:

* `feature_normalized` *list* - list of tuple of normalized feature text values

**normalize**

```python
def normalize(feature_raw, delimiter)
```

split by delimiter into a format that preserves value and label association found.

**Arguments**:

* `feature_raw` *string* - feature\_raw
* `delimiter` - delimiter is used for breaking string

**Returns**:

* `feature_normalized` *list* - list of tuple of normalized feature text values

### StrategySelector Objects

```python
class StrategySelector()
```

Based on the text input select the appropriate normalizer strategy to normalize the text

**get\_paths\_delimiter**

```python
def get_paths_delimiter(texts)
```

Identify whether the given texts is a paths string if yes return the appropriate delimiter to normalize text

**Arguments**:

* `texts` *list* - list of strings

**Returns**:

* `Bool` *True/False* - True if the given texts is identified as paths texts

**get\_key\_values\_delimiter**

```python
def get_key_values_delimiter(texts)
```

Identify whether the given texts is a key values string if yes return the appropriate delimiter to normalize text

**Arguments**:

* `texts` *str* - list of strings

**Returns**:

* `Bool` *True/False* - True if the given texts is identified as key:values text else returns false

**get\_properties\_delimiter**

```python
def get_properties_delimiter(texts)
```

Identify whether the given texts is a properties string if yes return the appropriate delimiter to normalize text

**Arguments**:

* `texts` *str* - list of strings

**Returns**:

* `delimiter` *dict* - delimiter to normalize the string

**select**

```python
def select(texts)
```

**Arguments**:

* `texts(list)` - list of strings

**Returns**:

* `strategy_properties` *list* - list of strategy and properties to normalize the text

## artifician.extractors.text\_extractors.keyword\_extractor

### KeywordExtractor Objects

```python
class KeywordExtractor()
```

Keyword Extractor class for extracting specific keywords from a text.

**Attributes**:

* `method` *str* - Method to use for keyword extraction ('manual', 'frequency', 'tfidf', etc.)
* `keywords` *List\[str]* - List of keywords to search within the text for 'manual' method.

**\_\_init\_\_**

```python
def __init__(method: str = 'manual', keywords: List[str] = None)
```

Initialize a new KeywordExtractor object.

**Arguments**:

* `method` *str* - Method to use for keyword extraction.
* `keywords` *List\[str]* - List of keywords to search within the text for 'manual' method.

**Raises**:

* `ValueError` - If the keywords list is empty for 'manual' method.

## artifician.extractors.html\_extractors

**get\_node\_text**

```python
def get_node_text(node: List[Union[str, Tag]]) -> str
```

Extracts text from a given node.

**Arguments**:

* `node` *List\[Union\[str, Tag]]* - The node list to extract text from.

**Returns**:

* `str` - The text content of the node.

**Raises**:

* `TypeError` - If the first element in the node list is not a bs4.element.Tag.
* `ValueError` - If the node list is empty.

**get\_node\_attribute**

```python
def get_node_attribute(node: List[Union[str, Tag]], attribute: str) -> str
```

Retrieves the value of a specified attribute from a given node.

**Arguments**:

* `node` *List\[Union\[str, Tag]]* - The node list to get the attribute from.
* `attribute` *str* - The name of the attribute to retrieve.

**Returns**:

* `str` - The value of the attribute.

**Raises**:

* `TypeError` - If the first element in the node list is not a bs4.element.Tag.

**get\_parent\_node\_text**

```python
def get_parent_node_text(node: List[Union[str, Tag]]) -> str
```

Extracts text from the parent node of a given node.

**Arguments**:

* `node` *List\[Union\[str, Tag]]* - The node list to extract parent text from.

**Returns**:

* `str` - The text content of the parent node.

**Raises**:

* `TypeError` - If the first element in the node list is not a bs4.element.Tag.

**get\_child\_node\_text**

```python
def get_child_node_text(node: List[Union[str, Tag]]) -> str
```

Extracts text from the first child node of a given node.

**Arguments**:

* `node` *List\[Union\[str, Tag]]* - The node list to extract child text from.

**Returns**:

* `str` - The text content of the child node.

**Raises**:

* `TypeError` - If the first element in the node list is not a bs4.element.Tag.

**count\_child\_nodes**

```python
def count_child_nodes(node: List[Union[str, Tag]]) -> int
```

Counts the number of child nodes for a given node.

**Arguments**:

* `node` *List\[Union\[str, Tag]]* - The node list to count children for.

**Returns**:

* `int` - The number of child nodes.

**Raises**:

* `TypeError` - If the first element in the node list is not a bs4.element.Tag.

**get\_sibling\_node\_text**

```python
def get_sibling_node_text(node: List[Union[str, Tag]]) -> str
```

Extracts text from the first sibling node of a given node.

**Arguments**:

* `node` *List\[Union\[str, Tag]]* - The node list to extract sibling text from.

**Returns**:

* `str` - The text content of the sibling node.

**Raises**:

* `TypeError` - If the first element in the node list is not a bs4.element.Tag.

**get\_parent\_attribute**

```python
def get_parent_attribute(node: List[Union[str, Tag]], attribute: str) -> str
```

Retrieves the value of a specified attribute from the parent of a given node.

**Arguments**:

* `node` *List\[Union\[str, Tag]]* - The node list to get the parent attribute from.
* `attribute` *str* - The name of the attribute to retrieve.

**Returns**:

* `str` - The value of the attribute from the parent node.

**Raises**:

* `TypeError` - If the first element in the node list is not a bs4.element.Tag.

**get\_child\_attribute**

```python
def get_child_attribute(node: List[Union[str, Tag]], attribute: str) -> str
```

Retrieves the value of a specified attribute from the first child of a given node.

**Arguments**:

* `node` *List\[Union\[str, Tag]]* - The node list to get the child attribute from.
* `attribute` *str* - The name of the attribute to retrieve.

**Returns**:

* `str` - The value of the attribute from the child node.

**Raises**:

* `TypeError` - If the first element in the node list is not a bs4.element.Tag.

## artifician.extractors


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://plato-solutions.gitbook.io/artifician/api.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
