Amundsen Monthly Update — May 2021

Mark Grover
amundsen-io
Published in
5 min readJun 16, 2021

--

Why Snap chose Amundsen, dbt integration, lineage graph viz, mono repo, and more!

Summary

May highlights from the Amundsen community:

  • Why Snap chose Amundsen
  • dbt ♥ Amundsen
  • Lineage graph viz
  • One repo to rule them all
  • ML feature discovery in Amundsen
  • Answer how to best use this table — with queries in Amundsen
  • Sydney DataEng meetup

All that and more details below!

Check out last month’s highlights here.

Don’t forget: Join our Slack community at slack.amundsen.io. We can’t wait to meet you!

Why Snap chose Amundsen

Video: Why Snap chose Amundsen

This month, we heard from Justin Jordan, a developer on Snap’s analytics engineering team, about why his team chose Amundsen and how they use it to solve their problems with data discovery, specifically metrics discovery metadata.

Problem

Due to high growth, Snap ran into problems related to metrics discovery. With the number of metrics accelerating every year, it became difficult to know what metrics existed and how to use them. Challenging questions arose: Who owns the metrics? Where are they surfaced? What do they mean? Where is their data stored?

Solution

Justin and his team chose Amundsen to build Snap’s metric catalog. This catalog allows teams to search for metrics and definitions. This is a huge step forward — now teams can see all relevant technical and business metadata for their metrics. They can manage the entire metric lifecycle with one source of truth.

Check out Justin’s presentation below to hear him explain why Amundsen was chosen and what’s next for his team.

dbt ♥ Amundsen

dbt has enabled thousands of analysts and analytics engineers to transform data in their data warehouse. Many Amundsen users are also heavy dbt users. To that effect, we have built a deep integration with dbt ingesting rich metadata like lineage, tags, and descriptions from dbt into Amundsen.

Thanks to contributions from Grant Seward, Amundsen now integrates with dbt’s core metadata. There’s still a lot to be done here. Join Amundsen Slack — we’d love your feedback.

Lineage Graph Viz

Graph Visualization design

Amundsen already has support for lineage. However, the default way of viewing lineage thus far was a list-based view, which includes downstream and upstream tabs. Amundsen has been working on a graphical view of lineage. Thanks to Knowl Baek for designs and to Verdan Mahmood for building the first beta version of the graph viz.

Amundsen lineage currently supports Neo4j proxy and support for Atlas proxy is coming soon.

We also want to highlight a common question from the community — how do you enable lineage on the front end? Verdan provides the answer in our community meeting recording here.

One repo to rule them all

As you know, Amundsen has a microservice-based architecture. For more flexibility, we started with one code repo for each of the four services — frontend service, metadata service, search service, and databuilder (which is a library not a service, but you get the idea). We learned from community and developer feedback that having different code repositories was a big pain, especially when changes needed to be coordinated across repositories.

We heard you!

We decided to move Amundsen code to one repo. If you develop on Amundsen, you should switch all your development to the new main branch. If you deploy Amundsen, this change doesn’t have much impact on you since the Python packages remain unchanged.

ML feature discovery in Amundsen

Design of Feature page in Amundsen

It’s happening! For a long time, many of us have wanted to integrate our ML projects in Amundsen. Feature engineering is one of the most time-consuming steps for ML engineers, and Amundsen is here to help. Allison Suarez Miranda and Knowl Baek proposed an RFC for indexing ML projects in Amundsen. The goal is for ML features to be indexed as first-class citizens just like tables, dashboards, and people — providing easy discoverability and ultimately, greater model re-usage and overall productivity.

Answer how to best use this table — with queries in Amundsen

Proposed model for storing common join conditions

Another exciting proposal for our roadmap, this time from Grant Seward — the ability to index queries and their individual components. Queries are one of the most common ways that Amundsen users interact with data. There is a significant amount of latent information that exists within these queries, which can be useful for users to discover. What does indexing queries mean for you? This means Amundsen will be able to answer key questions such as:

  • What tables are most commonly joined to this table and how?
  • Are there any filters (where clauses) that are always applied when accessing this table?
  • What real-world queries are being used on this table?
  • What queries are relevant now, given the recent queries that have been executed?

This will help you discover how to use your data.

Sydney DataEng meetup

Peter Hanssens presented at the Sydney DataEng meetup! He spoke about Amundsen/Neptune/CDK. See video below

Amundsen at Sydney DataEng meetup (yes, in person!)

Coming up next…

Shared dependencies across microservices

We’re continuing to work on making Amundsen easy to install and develop. After monorepo, we are working on unifying shared dependencies across various microservices. The goal is to avoid any potential mismatch between libraries shared between frontend, metadata, and search services. You can read more details here.

Dashboard support in Atlas metadata proxy

Amundsen was the first data catalog to add support for dashboards — reducing the duplication of analytical work as well as easing with change management. We currently support Tableau, Mode, Redash, and Superset. Dashboards only worked when you used the neo4j backend, we are now working on making dashboards work for Amundsen users who use the Atlas backend.

Next community meeting

Join us on Slack: slack.amundsen.io

Subscribe for periodic updates: Medium & Twitter

Curated with ❤ by Stemma

--

--