Understanding ETL Nodes

Modified on Wed, 06 Sep 2023

8 minutes read

This document is a draft version and subject to further review and revision. Please refrain from distributing or relying on its contents as final.

Following are the ETL nodes available in the platform.

ETL Nodes
DescriptionImage
Remove Duplicates
This node removes duplicate rows or records from your data. It ensures that each row is unique based on specified criteria, such as specific columns.
Write File
This node allows you to write the processed data to a new file, typically in CSV format, for storage or further use.
Join
The Join node combines data from multiple sources or datasets based on specified keys or columns. It's useful for merging related data.
Select
With the Select node, you can choose specific columns or fields from your data while discarding others. It helps you focus on the data you need for analysis or reporting.
Filter
This node lets you filter data rows based on specific conditions or criteria. You retain only the rows that meet those conditions.
Flatten
When you have nested or hierarchical data structures, the Flatten node helps you simplify the structure by converting it into a flat table format for easier analysis.
Aggregate
The Aggregate node is used for summarizing data. You can perform operations like sum, count, average, etc., on grouped data.
Find and Replace
This node enables you to search for specific values in your data and replace them with other values as needed. It's useful for data cleaning and standardization.
Sort
The Sort node arranges your data rows in a specified order, such as ascending or descending, based on one or more columns.
Union
The Union node combines multiple datasets into a single dataset, essentially stacking them on top of each other.