Amundsen Monthly Update — August 2021

Mark Grover

Published in

amundsen-io

4 min readSep 14, 2021

Summary

August highlights from the Amundsen community:

Driving Data Governance Maturity with Amundsen at Revolut
Amundsen at Devoted Health
Using Column Level Lineage in Amundsen at Alvin
Extending support for databuilder Atlas integration with table metadata models
OpenLineage extractor for databuilder
Generalized application model and adding Databricks application support
Data quality integration

All that and more details below!

Check out last month’s highlights here.

Don’t forget: Join our Slack community at slack.amundsen.io. We can’t wait to meet you!

Driving Data Governance Maturity with Amundsen at Revolut

Dinan Amiendiartha, a Data Risk Manager at Revolut, joined us at our last community meeting to share how his team drove data governance maturity with Amundsen. Since January 2020, Revolut has expanded its global presence, seen a 4x increase in volume of data (and exponentially growing!), and rapidly developing new products for Retail. Due to this hyper growth, Revolut needed a solution to better manage its data and build greater trust. Dinan discussed why they chose Amundsen, dove into the details of their databuilder ingestion pipeline, how Amundsen is supporting their 4 data governance initiatives, and overall impact of Amundsen.

Check out Dinan’s presentation to hear all the details.💡

Amundsen at Devoted Health

Adam Boscarino, Manager of Data Engineering at Devoted Health, joined us at our last community meeting to share how his team has been using Amundsen at Devoted. The Devoted team was experiencing problems that many of us have heard time and time again. They had a growing data platform with increasing pipelines and tables, but no source of truth for documentation and no data lineage. They were finding it difficult to understand their own documentation and how tables were populated. There was too much tribal knowledge. They needed a single source of truth. Adam discusses why they chose Amundsen, dives deeper into their infrastructure and customization, highlights their dbt integration, and wraps up with impact and future work.

Check out Adam’s presentation to learn more.💡

Using Column Level Lineage in Amundsen at Alvin

Oisin Coveney, a Software Engineer at Alvin, joined us at our last community meeting to share the development process of enabling column level lineage in Amundsen. He shared how they see lineage at Alvin and then dove into the development process of how they developed a transformer for Amundsen and mapped Alvin to enable column level lineage.

Check out Oisin’s presentation to learn more.💡

Extending support for databuilder Atlas integration with table metadata models

We have extended support for the databuilder Atlas integration with table metadata models. 🎉

What does this mean?

You can use databuilder extractors for a majority of table-related metadata to populate data in Atlas
This is a backwards-compatible feature (although you might want to test it in your dev/acc Atlas before running this on prod)
This change bridges the gap in terms of Neo4j vs. Atlas feature coverage, leaving just Column Features support to be introduced
You can still use Atlas tools for metadata ingestion (ilke hive hook or your custom scripts) in parallel to databuilder ingestions to load the metadata from other databases
You can use sample_data_loader_atlas.py to kickstart your Atlas test instance with the same sample data that is used for Neo4j or RDS

Huge shoutout to Mariusz Gorski!

OpenLineage extractor for databuilder

We now have an OpenLineage extractor for databuilder! This means you can extract table lineage information from OpenLineage events.

See PR details for more info. Props to Dominik Choma!

Generalized application model and added Databricks application support

Amundsen has had first-class citizen support for Airflow from Day 1. However, many companies don’t use Airflow, so we have now generalized the application model. Specifically, we have added support for Databricks application, for those companies that use Databricks’ ETL jobs.

See PR details for more info. Props to Jack Roof!

Data quality integration

We are starting to integrate existing Data Quality projects into Amundsen. The first step includes updating the frontend to display the status of the table. Specifically, we are working on display table status as: number of checks, number passing, and an optional timestamp.

Check out the open RFC and let us know what you think! 👀

Announcements

📣 We hosted office hours last month aimed for those just getting started with Amundsen. Check out the recording here.

📣 We are not hosting a September community meeting due to Mark Grover’s vacation earlier this month. Our next meeting will be on Thursday, October 7. See you next month!

Coming up next…

Next community meeting

Date: Thursday, October 7, 9am Pacific, 12pm Eastern, 6pm Central Europe
Add to your calendar: https://evt.to/diuoeeiw

Join us on Slack: slack.amundsen.io

Subscribe for periodic updates: Medium & Twitter

Curated with ❤ by Stemma