AWS Glue already provides code-based and visual interfaces to simplify ETL processes, and Glue DataBrew adds a way to clean and transform data without writing code.
It works with AWS data lakes, data warehouses and databases including S3, Redshift, Aurora and RDS.
DataBrew provides more than 250 pre-built transformations to automate data preparation tasks such as filtering anomalies, standardising formats, and correcting invalid values.
Examples include normalising data to standard date and time values, generating aggregates for analyses, and correcting invalid, misclassified, or duplicative data. Natural language processing capabilities make it possible to perform more sophisticated transformations.
Assembled steps can be saved as "recipes" for future reuse.
Prepared data is published to Amazon S3, ready for use in analytics and machine learning applications.
AWS Glue DataBrew is serverless and fully managed, and is charged on an pay-by-use basis when creating and running transformations on datasets.
“AWS customers are using data for analytics and machine learning at an unprecedented pace. However, these customers regularly tell us that their teams spend too much time on the undifferentiated, repetitive, and mundane tasks associated with data preparation,” said AWS vice president of database and analytics Raju Gulabani.
“Customers love the scalability and flexibility of code-based data preparation services like AWS Glue, but they could also benefit from allowing business users, data analysts, and data scientists to visually explore and experiment with data independently, without writing code.
"AWS Glue DataBrew features an easy-to-use visual interface that helps data analysts and data scientists of all technical levels understand, combine, clean, and transform data.”
AWS Glue DataBrew is generally available today in various AWS regions including Asia Pacific (Sydney), with wider availability coming soon.