Understanding ETL Nodes

Modified on Wed, 06 Sep 2023

8 minutes read

This document is a draft version and subject to further review and revision. Please refrain from distributing or relying on its contents as final.

Following are the ETL nodes available in the platform.

ETL Nodes	Description	Image
Remove Duplicates	This node removes duplicate rows or records from your data. It ensures that each row is unique based on specified criteria, such as specific columns.
Write File	This node allows you to write the processed data to a new file, typically in CSV format, for storage or further use.
Join	The Join node combines data from multiple sources or datasets based on specified keys or columns. It's useful for merging related data.
Select	With the Select node, you can choose specific columns or fields from your data while discarding others. It helps you focus on the data you need for analysis or reporting.
Filter	This node lets you filter data rows based on specific conditions or criteria. You retain only the rows that meet those conditions.
Flatten	When you have nested or hierarchical data structures, the Flatten node helps you simplify the structure by converting it into a flat table format for easier analysis.
Aggregate	The Aggregate node is used for summarizing data. You can perform operations like sum, count, average, etc., on grouped data.
Find and Replace	This node enables you to search for specific values in your data and replace them with other values as needed. It's useful for data cleaning and standardization.
Sort	The Sort node arranges your data rows in a specified order, such as ascending or descending, based on one or more columns.
Union	The Union node combines multiple datasets into a single dataset, essentially stacking them on top of each other.

See Also

Data Pipeline

Introduction to ETL