Juan Benavides Nanni; SDET II |
Pinterest conversions are critical for businesses looking to optimize their campaigns and track the performance of their advertisements. By leveraging Pinterest’s Conversion API and Conversion Tag, advertisers can gain deeper insights into user behavior and fine-tune their marketing efforts.
To make this process seamless for developers, we’ve created two NPM packages: pinterest-conversions-server and pinterest-conversions-client. These packages simplify the integration of Pinterest’s Conversion API and Conversion Tag, offering robust solutions for server-side and client-side tracking.
Pinterest conversions allow businesses to:
However, implementing these features often requires complex setup and knowledge of Pinterest’s API. Our packages help remove these barriers, making integration faster and more developer-friendly.
Our goal is to provide developers with a streamlined way to:
This package enables server-side event tracking using Pinterest’s Conversion API. Ideal for back-end developers, it supports advanced use cases like hashed user data and precise event attribution.
npm install pinterest-conversions-server
npm install pinterest-conversions-server
a. Initialize the API Client
import { PinterestConversionsAPI } from "pinterest-conversions-server";
const conversionsAPI = new PinterestConversionsAPI("<ACCESS_TOKEN>");
b. Track an Event
const data = {
event_name: "purchase",
action_source: "web",
event_time: Math.floor(Date.now() / 1000),
user_data: {
external_id: ["<hashed_external_id>"],
em: ["<hashed_email>"],
},
};
const response = await conversionsAPI.trackEvent("<ADVERTISER_ID>", data);
c. Set Up an API Server
import { PinterestConversionsServer } from "pinterest-conversions-server";
new PinterestConversionsServer("<ACCESS_TOKEN>", "<ADVERTISER_ID>").startPinterestApiServer(3000);
This package simplifies client-side tracking with the Pinterest Tag. It’s perfect for tracking events directly from the browser.
npm install pinterest-conversions-client
a. Initialize the Pinterest Tag
import { PinterestTag } from "pinterest-conversions-client";
const pinterestTag = new PinterestTag("<PIXEL_ID>");
b. Track an Event
pinterestTag.track("custom_event", {
product_id: "12345",
value: 100,
});
Both packages are designed with testing in mind:
{
"event_name": "purchase",
"action_source": "web",
"event_time": 1609459200,
"user_data": {
"em": ["hashed_email"],
"external_id": ["hashed_id"]
}
}
When using these packages, ensure all sensitive data, like email addresses, is properly hashed, and ensure that you clearly disclose and get any legally-required consent for the collection, sharing and use (including use by Pinterest) of the data you’re sharing as set forth in our Ad Data Terms. Additionally, configure CORS settings if deploying the server package in a multi-origin environment.
Both packages are open source and available under the MIT license. We welcome contributions and feedback from the developer community.
Tracking conversions shouldn’t be a hassle. With pinterest-conversions-server and pinterest-conversions-client, you can quickly integrate Pinterest’s powerful tracking tools into your projects. Start tracking, optimize your campaigns, and help boost your ROI with ease.
To learn more about engineering at Pinterest, check out the rest of our Engineering Blog and visit our Pinterest Labs site. To explore and apply to open roles, visit our Careers page.
Simplify Pinterest Conversion Tracking with NPM Packages was originally published in Pinterest Engineering Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Forms are everywhere online, from signing up for newsletters to making purchases. But let’s be honest — nothing’s more frustrating than a form that’s hard to fill out or riddled with unclear error messages. In this post, we’ll dive into practical tips and tricks to make your form validation seamless, user-friendly, and maybe even enjoyable!
We’ll walk through tips for using built-in HTML features and creating custom validation with JavaScript. No complicated jargon — just practical steps to improve your forms.
If you set the input type as “text” for a password field, the password will not be obscured as you type. Similarly, if you use the “text” input type for an email field, the browser’s default email pattern check will not be triggered.
When you use the correct input type, the password field will obscure the characters being typed, which is the desired behavior. Additionally, if you type an incorrect email address in the email field, the browser will notify you with a pop-up because of the built-in validation it provides.
Code:
Add Form Validation Tip #1 · hritik5102/Form-Validation-Tips@502e416
If the username is the required field in your form. To ensure that the username field contains valid data, mark it as “required.” If the user tries to submit the form without filling in this field, the browser will prompt them to complete it. You can also restrict the number of characters by using the “minlength” attribute.
Code:
Add Form Validation Tip #2 · hritik5102/Form-Validation-Tips@cd8abbb
Code:
Add Form Validation Tip #3 · hritik5102/Form-Validation-Tips@3b3230b
Code:
Add Form Validation Tip #4 · hritik5102/Form-Validation-Tips@9e64865
Code:
Add Form Validation Tip #5 · hritik5102/Form-Validation-Tips@73c63bc
Notice how the error message is displayed only when the end user shifts focus to the next input field.
Code:
Add Form Validation Tip #6 · hritik5102/Form-Validation-Tips@1ed19a1
Code:
Add Form Validation Tip #7 · hritik5102/Form-Validation-Tips@a83908c
Code:
Add Form Validation Tip #8 · hritik5102/Form-Validation-Tips@138ae47
Code:
Add Form Validation Tip #9 · hritik5102/Form-Validation-Tips@068226b
Why should one validate data if it is already sanitized?
Accessibility ensures everyone can access the content, while usability focuses on how easy it is to use the website. Together, they create the best possible user experience.
From an accessibility perspective, we must ensure that everyone not only knows the field is invalid but also understands the error message.
Below is a demonstration of the form running on VoiceOver, the built-in screen reader for macOS.
https://medium.com/media/8800939f538b4eacabf22a4c6e406d64/hrefCode:
Form Validation Tip #10 · hritik5102/Form-Validation-Tips@3d96464
You can find the source code for the above tips and tricks in the below repository:
GitHub - hritik5102/Form-Validation-Tips: Form Validation Tips Every Web Developer should know!
Thank you for taking the time to read the post until the end. Your attention and interest are greatly appreciated.
https://medium.com/media/fceae43fd9c499f9d20584444f4fe641/hrefPlease 👏🏻 if you like this post. It will motivate me to continue creating high-quality content like this one.
Thank you for taking the time to read my blog post! If you found it valuable, I would greatly appreciate it if you could share the post on Twitter and LinkedIn, etc. Your support in spreading the word about my content means a lot to me. Thank you again!
I hope you found this post helpful. If you want to stay up-to-date with my latest work, be sure to follow me on Twitter, LinkedIn, and GitHub.
Form Validation Tips Every Web Developer Should Know! was originally published in helpshift-engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.
David J. Berg*, David Casler^, Romain Cledat*, Qian Huang*, Rui Lin*, Nissan Pow*, Nurcan Sonmez*, Shashank Srikanth*, Chaoying Wang*, Regina Wang*, Darin Yu*
*: Model Development Team, Machine Learning Platform
^: Content Demand Modeling Team
A month ago at QConSF, we showcased how Netflix utilizes Metaflow to power a diverse set of ML and AI use cases, managing thousands of unique Metaflow flows. This followed a previous blog on the same topic. Many of these projects are under constant development by dedicated teams with their own business goals and development best practices, such as the system that supports our content decision makers, or the system that ranks which language subtitles are most valuable for a specific piece of content.
As a central ML and AI platform team, our role is to empower our partner teams with tools that maximize their productivity and effectiveness, while adapting to their specific needs (not the other way around). This has been a guiding design principle with Metaflow since its inception.
Standing on the shoulders of our extensive cloud infrastructure, Metaflow facilitates easy access to data, compute, and production-grade workflow orchestration, as well as built-in best practices for common concerns such as collaboration, versioning, dependency management, and observability, which teams use to setup ML/AI experiments and systems that work for them. As a result, Metaflow users at Netflix have been able to run millions of experiments over the past few years without wasting time on low-level concerns.
While Metaflow aims to be un-opinionated about some of the upper levels of the stack, some teams within Netflix have developed their own opinionated tooling. As part of Metaflow’s adaptation to their specific needs, we constantly try to understand what has been developed and, more importantly, what gaps these solutions are filling.
In some cases, we determine that the gap being addressed is very team specific, or too opinionated at too high a level in the stack, and we therefore decide to not develop it within Metaflow. In other cases, however, we realize that we can develop an underlying construct that aids in filling that gap. Note that even in that case, we do not always aim to completely fill the gap and instead focus on extracting a more general lower level concept that can be leveraged by that particular user but also by others. One such recurring pattern we noticed at Netflix is the need to deploy sets of closely related flows, often as part of a larger pipeline involving table creations, ETLs, and deployment jobs. Frequently, practitioners want to experiment with variants of these flows, testing new data, new parameterizations, or new algorithms, while keeping the overall structure of the flow or flows intact.
A natural solution is to make flows configurable using configuration files, so variants can be defined without changing the code. Thus far, there hasn’t been a built-in solution for configuring flows, so teams have built their bespoke solutions leveraging Metaflow’s JSON-typed Parameters, IncludeFile, and deploy-time Parameters or deploying their own home-grown solution (often with great pain). However, none of these solutions make it easy to configure all aspects of the flow’s behavior, decorators in particular.
Outside Netflix, we have seen similar frequently asked questions on the Metaflow community Slack as shown in the user quotes above:
Today, to answer the FAQ, we introduce a new — small but mighty — feature in Metaflow: a Config object. Configs complement the existing Metaflow constructs of artifacts and Parameters, by allowing you to configure all aspects of the flow, decorators in particular, prior to any run starting. At the end of the day, artifacts, Parameters and Configs are all stored as artifacts by Metaflow but they differ in when they are persisted as shown in the diagram below:
Said another way:
As an example, you can specify a Config that reads a pleasantly human-readable configuration file, formatted as TOML. The Config specifies a triggering ‘@schedule’ and ‘@resource’ requirements, as well as application-specific parameters for this specific deployment:
[schedule]
cron = "0 * * * *"
[model]
optimizer = "adam"
learning_rate = 0.5
[resources]
cpu = 1
Using the newly released Metaflow 2.13, you can configure a flow with a Config like above, as demonstrated by this flow:
import pprint
from metaflow import FlowSpec, step, Config, resources, config_expr, schedule
@schedule(cron=config_expr("config.schedule.cron"))
class ConfigurableFlow(FlowSpec):
config = Config("config", default="myconfig.toml", parser="tomllib.loads")
@resources(cpu=config.resources.cpu)
@step
def start(self):
print("Config loaded:")
pprint.pp(self.config)
self.next(self.end)
@step
def end(self):
pass
if __name__ == "__main__":
ConfigurableFlow()
There is a lot going on in the code above, a few highlights:
From the developer’s point of view, Configs behave like dictionary-like artifacts. For convenience, they support the dot-syntax (when possible) for accessing keys, making it easy to access values in a nested configuration. You can also unpack the whole Config (or a subtree of it) with Python’s standard dictionary unpacking syntax, ‘**config’. The standard dictionary subscript notation is also available.
Since Configs turn into dictionary artifacts, they get versioned and persisted automatically as artifacts. You can access Configs of any past runs easily through the Client API. As a result, your data, models, code, Parameters, Configs, and execution environments are all stored as a consistent bundle — neatly organized in Metaflow namespaces — paving the way for easily reproducible, consistent, low-boilerplate, and now easily configurable experiments and robust production deployments.
While you can get far by accompanying your flow with a simple config file (stored in your favorite format, thanks to user-definable parsers), Configs unlock a number of advanced use cases. Consider these examples from the updated documentation:
A major benefit of Config over previous more hacky solutions for configuring flows is that they work seamlessly with other features of Metaflow: you can run steps remotely and deploy flows to production, even when relying on custom parsers, without having to worry about packaging Configs or parsers manually or keeping Configs consistent across tasks. Configs also work with the Runner and Deployer.
When used in conjunction with a configuration manager like Hydra, Configs enable a pattern that is highly relevant for ML and AI use cases: orchestrating experiments over multiple configurations or sweeping over parameter spaces. While Metaflow has always supported sweeping over parameter grids easily using foreaches, it hasn’t been easily possible to alter the flow itself, e.g. to change @resources or @pypi/@conda dependencies for every experiment.
In a typical case, you trigger a Metaflow flow that consumes a configuration file, changing how a run behaves. With Hydra, you can invert the control: it is Hydra that decides what gets run based on a configuration file. Thanks to Metaflow’s new Runner and Deployer APIs, you can create a Hydra app that operates Metaflow programmatically — for instance, to deploy and execute hundreds of variants of a flow in a large-scale experiment.
Take a look at two interesting examples of this pattern in the documentation. As a teaser, this video shows Hydra orchestrating deployment of tens of Metaflow flows, each of which benchmarks PyTorch using a varying number of CPU cores and tensor sizes, updating a visualization of the results in real-time as the experiment progresses:
https://medium.com/media/e1e6d120dc74e75d9e52956b6cee7efe/hrefTo give a motivating example of what configurations look like at Netflix in practice, let’s consider Metaboost, an internal Netflix CLI tool that helps ML practitioners manage, develop and execute their cross-platform projects, somewhat similar to the open-source Hydra discussed above but with specific integrations to the Netflix ecosystem. Metaboost is an example of an opinionated framework developed by a team already using Metaflow. In fact, a part of the inspiration for introducing Configs in Metaflow came from this very use case.
Metaboost serves as a single interface to three different internal platforms at Netflix that manage ETL/Workflows (Maestro), Machine Learning Pipelines (Metaflow) and Data Warehouse Tables (Kragle). In this context, having a single configuration system to manage a ML project holistically gives users increased project coherence and decreased project risk.
Ease of configuration and templatizing are core values of Metaboost. Templatizing in Metaboost is achieved through the concept of bindings, wherein we can bind a Metaflow pipeline to an arbitrary label, and then create a corresponding bespoke configuration for that label. The binding-connected configuration is then merged into a global set of configurations containing such information as GIT repository, branch, etc. Binding a Metaflow, will also signal to Metaboost that it should instantiate the Metaflow flow once per binding into our orchestration cluster.
Imagine a ML practitioner on the Netflix Content ML team, sourcing features from hundreds of columns in our data warehouse, and creating a multitude of models against a growing suite of metrics. When a brand new content metric comes along, with Metaboost, the first version of the metric’s predictive model can easily be created by simply swapping the target column against which the model is trained.
Subsequent versions of the model will result from experimenting with hyper parameters, tweaking feature engineering, or conducting feature diets. Metaboost’s bindings, and their integration with Metaflow Configs, can be leveraged to scale the number of experiments as fast as a scientist can create experiment based configurations.
Consider a Metaboost ML project named `demo` that creates and loads data to custom tables (ETL managed by Maestro), and then trains a simple model on this data (ML Pipeline managed by Metaflow). The project structure of this repository might look like the following:
├── metaflows
│ ├── custom -> custom python code, used by
| | | Metaflow
│ │ ├── data.py
│ │ └── model.py
│ └── training.py -> defines our Metaflow pipeline
├── schemas
│ ├── demo_features_f.tbl.yaml -> table DDL, stores our ETL
| | output, Metaflow input
│ └── demo_predictions_f.tbl.yaml -> table DDL,
| stores our Metaflow output
├── settings
│ ├── settings.configuration.EXP_01.yaml -> defines the additive
| | config for Experiment 1
│ ├── settings.configuration.EXP_02.yaml -> defines the additive
| | config for Experiment 2
│ ├── settings.configuration.yaml -> defines our global
| | configuration
│ └── settings.environment.yaml -> defines parameters based on
| git branch (e.g. READ_DB)
├── tests
├── workflows
│ ├── sql
│ ├── demo.demo_features_f.sch.yaml -> Maestro workflow, defines ETL
│ └── demo.main.sch.yaml -> Maestro workflow, orchestrates
| ETLs and Metaflow
└── metaboost.yaml -> defines our project for
Metaboost
The configuration files in the settings directory above contain the following YAML files:
# settings.configuration.yaml (global configuration)
model:
fit_intercept: True
conda:
numpy: '1.22.4'
"scikit-learn": '1.4.0'
# settings.configuration.EXP_01.yaml
target_column: metricA
features:
- runtime
- content_type
- top_billed_talent
# settings.configuration.EXP_02.yaml
target_column: metricA
features:
- runtime
- director
- box_office
Metaboost will merge each experiment configuration (*.EXP*.yaml) into the global configuration (settings.configuration.yaml) individually at Metaboost command initialization. Let’s take a look at how Metaboost combines these configurations with a Metaboost command:
(venv-demo) ~/projects/metaboost-demo [branch=demoX]
$ metaboost metaflow settings show --yaml-path=configuration
binding=EXP_01:
model: -> defined in setting.configuration.yaml (global)
fit_intercept: true
conda: -> defined in setting.configuration.yaml (global)
numpy: 1.22.4
"scikit-learn": 1.4.0
target_column: metricA -> defined in setting.configuration.EXP_01.yaml
features: -> defined in setting.configuration.EXP_01.yaml
- runtime
- content_type
- top_billed_talent
binding=EXP_02:
model: -> defined in setting.configuration.yaml (global)
fit_intercept: true
conda: -> defined in setting.configuration.yaml (global)
numpy: 1.22.4
"scikit-learn": 1.4.0
target_column: metricA -> defined in setting.configuration.EXP_02.yaml
features: -> defined in setting.configuration.EXP_02.yaml
- runtime
- director
- box_office
Metaboost understands it should deploy/run two independent instances of training.py — one for the EXP_01 binding and one for the EXP_02 binding. You can also see that Metaboost is aware that the tables and ETL workflows are not bound, and should only be deployed once. These details of which artifacts to bind and which to leave unbound are encoded in the project’s top-level metaboost.yaml file.
(venv-demo) ~/projects/metaboost-demo [branch=demoX]
$ metaboost project list
Tables (metaboost table list):
schemas/demo_predictions_f.tbl.yaml (binding=default):
table_path=prodhive/demo_db/demo_predictions_f
schemas/demo_features_f.tbl.yaml (binding=default):
table_path=prodhive/demo_db/demo_features_f
Workflows (metaboost workflow list):
workflows/demo.demo_features_f.sch.yaml (binding=default):
cluster=sandbox, workflow.id=demo.branch_demox.demo_features_f
workflows/demo.main.sch.yaml (binding=default):
cluster=sandbox, workflow.id=demo.branch_demox.main
Metaflows (metaboost metaflow list):
metaflows/training.py (binding=EXP_01): -> EXP_01 instance of training.py
cluster=sandbox, workflow.id=demo.branch_demox.EXP_01.training
metaflows/training.py (binding=EXP_02): -> EXP_02 instance of training.py
cluster=sandbox, workflow.id=demo.branch_demox.EXP_02.training
Below is a simple Metaflow pipeline that fetches data, executes feature engineering, and trains a LinearRegression model. The work to integrate Metaboost Settings into a user’s Metaflow pipeline (implemented using Metaflow Configs) is as easy as adding a single mix-in to the FlowSpec definition:
from metaflow import FlowSpec, Parameter, conda_base, step
from custom.data import feature_engineer, get_data
from metaflow.metaboost import MetaboostSettings
@conda_base(
libraries=MetaboostSettings.get_deploy_time_settings("configuration.conda")
)
class DemoTraining(FlowSpec, MetaboostSettings):
prediction_date = Parameter("prediction_date", type=int, default=-1)
@step
def start(self):
# get show_settings() for free with the mixin
# and get convenient debugging info
self.show_settings(exclude_patterns=["artifact*", "system*"])
self.next(self.get_features)
@step
def get_features(self):
# feature engineers on our extracted data
self.fe_df = feature_engineer(
# loads data from our ETL pipeline
data=get_data(prediction_date=self.prediction_date),
features=self.settings.configuration.features +
[self.settings.configuration.target_column]
)
self.next(self.train)
@step
def train(self):
from sklearn.linear_model import LinearRegression
# trains our model
self.model = LinearRegression(
fit_intercept=self.settings.configuration.model.fit_intercept
).fit(
X=self.fe_df[self.settings.configuration.features],
y=self.fe_df[self.settings.configuration.target_column]
)
print(f"Fit slope: {self.model.coef_[0]}")
print(f"Fit intercept: {self.model.intercept_}")
self.next(self.end)
@step
def end(self):
pass
if __name__ == "__main__":
DemoTraining()
The Metaflow Config is added to the FlowSpec by mixing in the MetaboostSettings class. Referencing a configuration value is as easy as using the dot syntax to drill into whichever parameter you’d like.
Finally let’s take a look at the output from our sample Metaflow above. We execute experiment EXP_01 with
metaboost metaflow run --binding=EXP_01
which upon execution will merge the configurations into a single settings file (shown previously) and serialize it as a yaml file to the .metaboost/settings/compiled/ directory.
You can see the actual command and args that were sub-processed in the Metaboost Execution section below. Please note the –config argument pointing to the serialized yaml file, and then subsequently accessible via self.settings. Also note the convenient printing of configuration values to stdout during the start step using a mixed in function named show_settings().
(venv-demo) ~/projects/metaboost-demo [branch=demoX]
$ metaboost metaflow run --binding=EXP_01
Metaboost Execution:
- python3.10 /root/repos/cdm-metaboost-irl/metaflows/training.py
--no-pylint --package-suffixes=.py --environment=conda
--config settings
.metaboost/settings/compiled/settings.branch_demox.EXP_01.training.mP4eIStG.yaml
run --prediction_date20241006
Metaflow 2.12.39+nflxfastdata(2.13.5);nflx(2.13.5);metaboost(0.0.27)
executing DemoTraining for user:dcasler
Validating your flow...
The graph looks good!
Bootstrapping Conda environment... (this could take a few minutes)
All packages already cached in s3.
All environments already cached in s3.
Workflow starting (run-id 50), see it in the UI at
https://metaflowui.prod.netflix.net/DemoTraining/50
[50/start/251640833] Task is starting.
[50/start/251640833] Configuration Values:
[50/start/251640833] settings.configuration.conda.numpy = 1.22.4
[50/start/251640833] settings.configuration.features.0 = runtime
[50/start/251640833] settings.configuration.features.1 = content_type
[50/start/251640833] settings.configuration.features.2 = top_billed_talent
[50/start/251640833] settings.configuration.model.fit_intercept = True
[50/start/251640833] settings.configuration.target_column = metricA
[50/start/251640833] settings.environment.READ_DATABASE = data_warehouse_prod
[50/start/251640833] settings.environment.TARGET_DATABASE = demo_dev
[50/start/251640833] Task finished successfully.
[50/get_features/251640840] Task is starting.
[50/get_features/251640840] Task finished successfully.
[50/train/251640854] Task is starting.
[50/train/251640854] Fit slope: 0.4702672504331096
[50/train/251640854] Fit intercept: -6.247919678070083
[50/train/251640854] Task finished successfully.
[50/end/251640868] Task is starting.
[50/end/251640868] Task finished successfully.
Done! See the run in the UI at
https://metaflowui.prod.netflix.net/DemoTraining/50
Metaboost is an integration tool that aims to ease the project development, management and execution burden of ML projects at Netflix. It employs a configuration system that combines git based parameters, global configurations and arbitrarily bound configuration files for use during execution against internal Netflix platforms.
Integrating this configuration system with the new Config in Metaflow is incredibly simple (by design), only requiring users to add a mix-in class to their FlowSpec — similar to this example in Metaflow documentation — and then reference the configuration values in steps or decorators. The example above templatizes a training Metaflow for the sake of experimentation, but users could just as easily use bindings/configs to templatize their flows across target metrics, business initiatives or any other arbitrary lines of work.
It couldn’t be easier to get started with Configs! Just
pip install -U metaflow
to get the latest version and head to the updated documentation for examples. If you are impatient, you can find and execute all config-related examples in this repository as well.
If you have any questions or feedback about Config (or other Metaflow features), you can reach out to us at the Metaflow community Slack.
We would like to thank Outerbounds for their collaboration on this feature; for rigorously testing it and developing a repository of examples to showcase some of the possibilities offered by this feature.
Introducing Configurable Metaflow was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Editor’s note: This is the second post in a series that explores a range of topics about upcoming AI regulation, including an overview of the the EU AI Act and Palantir solutions that foster and support regulatory compliance when using AI. This blog post provides an overview on how Palantir AIP empowers organizations to meet some of their key compliance obligations through core capabilities available in-platform, with a special focus on High-Risk AI Systems. In-house legal and compliance readers in particular will find in it a useful digest of key considerations for their organizations.
The rapid adoption of AI, including Generative AI, has driven policymakers worldwide to develop regulations governing the development and deployment of AI systems, particularly those impacting fundamental rights and safety. In the first post of this series, we provided an overview of one such regulation: the EU AI Act.
As these regulations mature, organizations will be responsible for implementing a wide variety of compliance requirements throughout the AI lifecycle. Palantir AIP empowers organizations to meet many key compliance requirements efficiently and effectively in two primary ways:
In this blog post, we highlight AIP’s core capabilities that enable institutional compliance for organizations leveraging AI, focusing on three key themes emphasized by emerging AI regulations such as the EU AI Act: Data Governance, Risk Management, and Record Keeping. We’ll address custom applications in a future post.
Within each of these themes, Palantir AIP offers an integrated toolset that can assist customers in defining and implementing processes to meet compliance requirements effectively. This includes in-platform tools for tracking, maintenance, and continuous monitoring of AI applications leveraging LLMs and other advanced multi-modal AI models. Our approach enables customers to deploy AI solutions confidently, ensuring they are effective for their use cases and organizational outcomes, while tangibly advancing transparency and trustworthiness principles that align with the core requirements of emerging regulations like the EU AI Act.
We know from experience that AI systems hold significant promise for helping to address mission-critical challenges — from inventory entity resolution to infection risk alerting — in real-world settings. Unlocking this transformative potential hinges on integrating advanced AI models with high-quality, relevant data tailored to their operational environments. That’s why a robust data governance strategy is the foundation of any effective AI governance framework.
By enabling organizations to implement comprehensive data governance throughout the AI lifecycle, organizations can ensure that every stage — from data collection to model training and, eventually, model deployment — maintains the highest standards of quality and relevance. This enables AI systems to drive impactful outcomes while mitigating risks to fundamental rights and safety. In doing so, institutions are better positioned to ensure compliance with data governance requirements for AI systems. For instance, Article 10 of the EU AI Act mandates that “training, validation and testing data sets shall be subject to data governance and management practices.” Adhering to such regulations helps organizations maintain the integrity and efficacy of their AI initiatives.
Throughout every stage of the AI lifecycle, Palantir AIP empowers users to ensure that the appropriate individuals have the correct access to the necessary information. Security starts with identity, so AIP uses strong authentication methods like single sign-on (SSO) and multi-factor authentication (MFA) to confirm users are who they say they are.
Palantir AIP simplifies the process of organizing users into groups, whether managed internally or through an organization’s existing identity providers. Once verified, Palantir’s role-based authorization grants access for users and groups based on precise permissions, so that users interact only with the data that they are authorized to see. By clearly defining group boundaries and maintaining strict separation between different teams and resources, organizations can use AIP to help keep sensitive data compartmentalized and protected.
Additionally, AIP’s Markings feature adds mandatory controls, allowing administrators to require special permissions for users to find or access highly sensitive information. For cases requiring even more precise control, granular access controls allow organizations to define exactly who can access specific data and resources down to individual rows or columns using mechanisms like Restricted Views. This exceptional level of control ensures that sensitive or restricted information is accessible only to authorized individuals.
With Palantir AIP, organizations can foster collaboration among employees while confidently relying on a foundation of configurable access controls. Strict controls and thorough documentation of data access are essential for providers of high-risk AI systems. According to Article 10(5)c of the EU AI Act, providers of high risk AI systems must implement appropriate safeguards for handling personal data. Consistent and standardized access management empowers institutions to protect their most valuable assets and maintain compliance with emerging AI regulations, as well as other applicable security and data protection requirements.
Trustworthy AI applications are built on high-quality, reliable data foundations. Article 10 of the EU AI Act highlights the importance of data preparation processes, especially for training models used in high-risk AI systems. This regulation requires institutions to carefully consider operations such as labeling, cleaning, and aggregating their data.
Data cleaning is a process that involves refining and enriching datasets of all types (e.g., structured data, streaming data, media data, etc.) to produce high-quality information. Palantir AIP provides a suite of tools that enable users across the technical spectrum to conduct these data cleaning tasks. Applications like Pipeline Builder and Contour offer no-code environments for users to build and manage data pipelines without writing code. On the other end of the spectrum, Code Repositories provide pro-code environments where developers can implement custom data transformations. Additionally, Data Lineage offers an dynamic visual representation of all steps from raw data ingestion to the finalized dataset.
These tools not only assist users in performing the necessary data cleaning for their AI systems but also maintain records of these activities. This enhances transparency and supports adherence to compliance standards, as outlined in EU AI Act Article 11 and Annex IV. These sections of the Act emphasize the need for organizations to produce technical documentation demonstrating the architecture of the high-risk AI system, including data cleaning methodologies.
Mitigating risk and bias in the use of AI models and tools involves multiple facets. In this section, we discuss various mitigation strategies to reduce risks related to data access and the testing and evaluation of models to mitigate bias, as outlined in Article 10(2)(f) and (g) of the EU AI Act, which explicitly require high-risk AI system providers to examine and implement measures to mitigate potential biases.
The Palantir Platform offers tools such as the Sensitive Data Scanner (SDS) to mitigate the risk of broad data access. SDS can be used to identify protected characteristics in datasets, marking them appropriately to protect the information from broad accessibility. By controlling access to sensitive attributes that could introduce bias — such as gender, age, or physical home address — SDS can help users handle this data more carefully and prevents these attributes from inappropriately influencing model outcomes. In tandem, administrators can require operational users to justify the processing of user-defined special categories of data through tools like the Checkpoints application. This helps users ensure that the use of sensitive data is deliberate and justifiable, reducing the risk of improper data processing. Users can also utilize obfuscation tools for enhanced encryption capabilities and bolstered data minimization best practices. Data minimization reduces the exposure of irrelevant or sensitive information that could contribute to biased results, thereby mitigating bias at the data level.
Palantir software empowers users to leverage AI models by providing robust tools for testing and evaluation to demonstrate confidence in model use cases before pushing them to production environments. It offers organizations the capability to manage and select LLM models of their choosing while offering the flexibility to develop custom machine learning models within the platform.
For LLM use, applications like AIP Logic allow non-technical users to create environments for building, testing, and releasing functions powered by LLMs. A key component of AIP Logic is its “Evaluations” feature, enabling organizations to test use cases, evaluate functions, and validate that the output aligns with industry standards. By rigorously testing models against diverse datasets and evaluation metrics, organizations can identify and correct biased behaviors at all stages of the AI lifecycle.
For models developed within the platform, customers can configure their own evaluation libraries to measure model performance, fairness, robustness, and other metrics. These customizable evaluation libraries enable users to assess model fairness explicitly, providing insights into potential biases and facilitating corrective measures.
Many regulatory frameworks mandate that data should not be accessible once it is no longer needed for a valid purpose or when retention periods expire. For example, Article 10 (5)(e) of the EU AI Act requires high-risk AI systems providers to delete personal data once the bias has been corrected or the data has reached its retention period.
To address these requirements, the Palantir Platform offers methods to set retention rules, ensuring proper data removal within organization-defined timeframes. Importantly, Palantir software further allows lineage-aware deletions, ensuring the removal of specific datasets as well as related downstream information. This can be configured using the Data Lifetime application. This comprehensive approach to data management facilitates compliance with data retention and deletion regulations, further enhancing the robustness of your data governance strategy.
As AI systems become more integrated into the critical operations of businesses, managing the associated risks becomes imperative. Certain regulations, like the EU AI Act, explicitly mandate organizations to establish a risk management program. Specifically, Article 9 of the EU AI Act requires risk management systems be implemented for high-risk AI systems, while Article 55 mandates providers of General-Purpose AI (GPAI) models to evaluate these models and identify and mitigate systematic risks. This section delves into how AIP’s features can help organizations understand the AI system lifecycle through systematic reviews and oversight of updates, aiding compliance with legal requirements.
Palantir’s Modeling Objectives provide a structured framework to document and manage the risks associated with AI systems. This feature enables organizations to collaboratively evaluate models and ensure they pass predefined quality checks before being operationalized and pushed to production for broader adoption. Modeling Objectives also enables organizations to facilitate staged rollout and release management with Model Asset Versioning, providing a controlled environment to manage the use of AI models.
Organizations have the ability to create and test models within Palantir’s Platform and use the Model Catalog to access and review key characteristics of open-source models or those for which they already hold existing licensing. Furthermore, Palantir’s Platform enables organizations to integrate externally-hosted models stored in third-party applications, services, or model providers such as Azure Machine Learning, Amazon SageMaker, OpenAI, or Anthropic. Whether organizations choose to create or fine-tune their own models, use open-source ones, or integrate externally-hosted models, the Palantir Platform offers full versioning, granular model permissions, and governed model lineage. These features are designed not only to enhance the efficiency and effectiveness of outcomes but also to significantly increase transparency.
Access controls within AIP are pivotal for targeted risk management measures, allowing administrators to manage who can build, evaluate, and interact with AI systems. Key features include:
AIP Evals is Palantir’s framework for transitioning AI prototypes into reliable production workflows. It emphasizes rigorous Testing and Evaluation (T&E) to ensure consistent and effective real-world performance. Designed for Generative AI, it handles non-determinism and specific input debugging. Through systematic reviews, unit testing, and continuous improvement, AIP Evals enables empirical validation and refinement of AI systems, ensuring transparency and trust for scalable, responsible AI deployment. One of our previous blog posts provides step-by-step guidance on creating a prototype AI system and taking it all the way to production.
AI systems are required to automatically record events throughout their lifespan and use. According to Article 12 of the EU AI Act, high-risk AI systems must support automatic record keeping of event logs and other critical logging capabilities. In this context, it’s important to note that audit logs from all Palantir platforms are generated and made available to customers with administrator access rights. These logs are initially written to disk and then archived to a environment-specific storage bucket within 24 hours (e.g., AWS S3, Azure Blob Storage, on-premises storage). Palantir customers have the option to enable audit infrastructure that exports audit logs from the archive to a per-organization dataset for analysis within the platform or a downstream SIEM. For more information on the monitoring of security logs, please refer to our documentation.
Palantir offers tools that enable organizations to closely monitor their most sensitive workflows. This is particularly important when it comes to changes in the ontology, which are often viewed as highly sensitive due to the impact they can have on data interpretation and analysis.
The Palantir Ontology acts as a structured framework that organizes and interprets data, providing a shared vocabulary for different systems, applications, and users. This consistent ontology ensures seamless data integration from various sources, facilitating easy and unified data querying and analysis.
Alterations to the ontology in Palantir’s Platform are primarily executed through ‘Actions,’ which can trigger associated downstream processes. These modifications usually stem from specific decisions. The Action Log feature streamlines the creation and maintenance of object types that represent these decisions. To visualize these Action Logs, Palantir provides customizable widgets within its platform, tailored to the needs of each organization. This capability allows organizations to gain a comprehensive understanding of the broader context and decision-making processes behind ontology changes.
Edit history meticulously logs all changes made to an object, detailing what was changed, by whom, and when. It tracks specific modifications to object properties, offering a focused record of edits over time. This level of transparency and control is crucial for maintaining data integrity and ensuring compliance with regulatory requirements.
Palantir has a long-standing commitment to upholding privacy-first principles. In addition to the development of new features, Palantir continually emphasizes the importance of robust auditing mechanisms. These mechanisms serve as a backstop to ensure that the features and users’ interactions with them can be reviewed as needed for consistency with institutional policies and regulatory requirements. This dual focus ensures that as the platform evolves, it remains in line with the highest standards of data privacy and transparency.
Navigating the complexities of AI compliance is an evolving challenge that requires robust and adaptive solutions. As regulations like the EU AI Act come into effect, organizations must be prepared to implement comprehensive compliance measures across the entire AI lifecycle. Palantir AIP stands out by offering a powerful combination of core capabilities and custom application tools that enable organizations to meet these stringent regulatory requirements efficiently and effectively.
By focusing on key themes such as Data Governance, Risk Management, and Record Keeping, Palantir AIP provides an integrated toolset that helps organizations define, implement, and monitor compliance processes seamlessly. The platform’s in-built tools for tracking, maintenance, and continuous monitoring ensure that AI applications, including those leveraging advanced models like LLMs, remain transparent and trustworthy.
In summary, Palantir AIP empowers organizations to innovate with confidence while adhering to emerging regulatory standards. Future posts will delve deeper into custom applications, illustrating how organizations can tailor their compliance strategies to meet specific needs. By adopting Palantir AIP, organizations are better positioned to advance their AI capabilities while maintaining rigorous compliance, setting a benchmark for responsible and sustainable AI deployment.
Annabelle Larose, Senior Technical Program Manager, Privacy & Civil Liberties Engineering
Colton Rusch, Privacy and Civil Liberties Engineer
AI Systems Governance through the Palantir Platform was originally published in Palantir Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
How Airbnb is adapting ranking for our map interface.
Malay Haldar, Hongwei Zhang, Kedar Bellare Sherry Chen
Search is the core mechanism that connects guests with Hosts at Airbnb. Results from a guest’s search for listings are displayed through two interfaces: (1) as a list of rectangular cards that contain the listing image, price, rating, and other details on it, referred to as list-results and (2) as oval pins on a map showing the listing price, called map-results. Since its inception, the core of the ranking algorithm that powered both these interfaces was the same — ordering listings by their booking probabilities and selecting the top listings for display.
But some of the basic assumptions underlying ranking, built for a world where search results are presented as lists, simply break down for maps.
The central concept that drives ranking for list-results is that user attention decays starting from the top of the list, going down towards the bottom. A plot of rank vs click-through rates in Figure 1 illustrates this concept. X-axis represents the rank of listings in search results. Y-axis represents the click-through rate (CTR) for listings at the particular rank.
To maximize the connections between guests and Hosts, the ranking algorithm sorts listings by their booking probabilities based on a number of factors and sequentially assigns their position in the list-results. This often means that the larger a listing’s booking probability, the more attention it receives from searchers.
But in map-results, listings are scattered as pins over an area (see Figure 2). There is no ranked list, and there is no decay of user attention by ranking position. Therefore, for listings that are shown on the map, the strategy of sorting by booking probabilities is no longer applicable.
To adapt ranking to the map interface, we look at new ways of modeling user attention flow across a map. We start with the most straightforward assumption that user attention is spread equally across the map pins. User attention is a very precious commodity and most searchers only click through a few map pins (see Figure 3). A large number of pins on the map means those limited clicks may miss discovering the best options available. Conversely, limiting the number of pins to the topmost choices increases the probability of the searcher finding something suitable, but runs the risk of removing their preferred choice.
We test this hypothesis, controlled by a parameter . The parameter serves as an upper bound on the ratio of the highest booking probability vs the lowest booking probability when selecting the map pins. The bounds set by the parameter controls the booking probability of the listings behind the map pins. The more restricted the bounds, the higher the average booking probability of the listings presented as map pins. Figure 4 summarizes the results from A/B testing a range of parameters.
The reduction in the average impressions to discovery metric in Figure 4 denotes the fewer number of map pins a searcher has to process before clicking the listing that they eventually book. Similarly, the reduction in average clicks to discovery shows the fewer number of map pins a searcher has to click through to find the listing they booked.
Launching the restricted version resulted in one of the largest bookings improvement in Airbnb ranking history. More importantly, the gains were not only for bookings, but for quality bookings. This could be seen by the increase in trips that resulted in 5-star rating after the stay from the treatment group, in comparison to trips from the control group.
In our next iteration of modeling user attention, we separate the map pins into two tiers. The listings with the highest booking probabilities are displayed as regular oval pins with price. Listings with comparatively lower booking probabilities are displayed as smaller ovals without price, referred to as mini-pins (Figure 5). By design, mini-pins draw less user attention, with click-through rates about 8x less than regular pins.
This comes in handy particularly for searches on desktop where 18 results are shown in a grid on the left, each of them requiring a map pin on the right (Figure 6).
The number of map pins is fixed in this case, and limiting them, as we did in the previous section, is not an option. Creating the two tiers prioritizes user attention towards the map pins with the highest probabilities of getting booked. Figure 7 shows the results of testing the idea through an online A/B experiment.
In our final iteration, we refine our understanding of how user attention is distributed over the map by plotting the click-through rate of map pins located at different coordinates on the map. Figure 8 shows these plots for the mobile (top) and the desktop apps (bottom).
To maximize the chances that a searcher will discover the listings with the highest booking probabilities, we design an algorithm that re-centers the map such that the listings with the highest booking probabilities appear closer to the center. The steps of this algorithm are illustrated in Figure 9, where a range of potential coordinates are evaluated and the one which is closer to the listings with the highest booking probabilities is chosen as the new center.
When tested in an online A/B experiment, the algorithm improved uncancelled bookings by 0.27%. We also observed a reduction of 1.5% in map moves, indicating less effort from the searchers to use the map.
Users interact with maps in a way that’s fundamentally different from interacting with items in a list. By modeling the user interaction with maps in a progressively sophisticated manner, we were able to improve the user experience for guests in the real world. However, the current approach has a challenge that remains unsolved: how can we represent the full range of available listings on the map? This is part of our future work. A more in-depth discussion of the topics covered here, along with technical details, is presented in our research paper that was published at the KDD ’24 conference. We welcome all feedback and suggestions.
If this type of work interests you, we encourage you to apply for an open position today.
Improving Search Ranking for Maps was originally published in The Airbnb Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Written by Alex Chantavy
Today we’re thrilled to announce that Lyft has donated Cartography to the Cloud Native Computing Foundation (CNCF). Since Lyft open sourced it in 2019, it’s been rewarding to see the project grow from an experimental tool to a solution that’s been battle-tested in production by multiple companies. In this post, we’ll reflect on our learnings from Cartography’s open source journey and discuss where it’s going next.
Origins and growth
Cartography was first created to find attack paths that a malicious actor could take to compromise data in cloud environments. We first used it to understand complicated IAM permissions so that we could think like an attacker and identify the shortest path to administrator privileges.
We soon realized that this graph capability was equally valuable for defenders. It also allowed us to quickly answer questions like, “Which of my services are internet-facing and running a vulnerable version of a software library?” or “Which directors at the company own the most security risk?”
In 2020, we chose to use Cartography as the backbone for our vulnerability management program because it helped us contextualize risks across our infrastructure in a way that no other tools did. As I talked about at BSidesSF, this was not an easy or smooth journey but it forced us to quickly mature the tool and improve our correctness, stability, and performance.
Lessons learned from open source
Through all this, we built a community and got to meet many of you. Matt Klein’s advice on managing an open source project (1, 2) was extremely helpful, and I’ll summarize some of my own recommendations for those considering open sourcing a project:
The impact of open sourcing Cartography
Cartography has grown to over 300 Slack members, 90 committers to the main source branch, and over a dozen companies adopting it (that we know of). It’s been incredibly rewarding to see this growth, and it was cool to see how community members built alerting around it or tried to experiment with other backend tech. Having a good open source presence helped Lyft source candidates, and as mentioned above, we were even able to make a key hire from the community. Cartography’s open source status enabled our vuln management and auditing programs to run smoother internally at Lyft as the community often encountered and fixed bugs before we did. We also benefited from dozens of community-contributed modules, many of which were from former Lyft employees and it was nice being able to collaborate with them even after they had changed companies.
The path to the CNCF
Over the past nearly 6 years working on Cartography, I believe more and more that having a self-maintaining map of your infra is a superpower and I can’t imagine working anywhere without it. I’d like to see a world where having a “Cartography-like” graph representation of infra assets becomes something of an open standard, especially since modern companies must maintain visibility over an ever-growing plethora of providers and tools.
However, one of the realities of running an open source project is that contributors (understandably) come and go with time. A successful project needs a steady stream of people discovering it, making contributions, and becoming maintainers. It became clear that growing the project wasn’t going to be possible over the long run if its steward was just one company.
In August 2023, we applied to donate Cartography to the CNCF. We pursued the CNCF in particular because Cartography was built to solve security problems that are uniquely complicated in cloud-native environments.
By donating the project we hope to:
After a long, thorough review, Cartography was finally accepted by the foundation in August 2024 — big thanks to the Technical Oversight Committee and CNCF staff for shepherding the project through the vote and onboarding it!
The future
Now that Cartography is a CNCF project, what does this mean? The only practical differences are that our Slack channel is now hosted by CNCF instead of Lyft, and our GitHub URL is now slightly different: https://github.com/cartography-cncf/cartography. Cartography will still be developed and led by those who are interested, i.e. its community members. If this project seems useful to you, please try it out and say hi in the #cartography channel on the CNCF Slack — we’d love to hear your feedback. If you think someone else would find Cartography useful, please also share it with them. There are lots of new technical directions I’d like to explore in the future but we can only do this if the community continues to grow and be supported — future blog post to come!
Working on open source has been a career highlight for me, and I like to think that we’ve done at least a little to help the information security industry think in graphs and not lists.
Thank you
Special thanks to the leadership of Lyft’s security team who have been instrumental in their support of Cartography in this multi-year open source journey: Sacha Faust (Cartography’s original creator), Chaim Sanders, Nico Waisman, Matthew Webber, Ben Stewart, Martin Conte Mac Donnell, Samantha Davison, and Jason Vogrinec.
Thanks to Andrew Johnson, Taya Steere, and Evan Davis for taking Cartography from 0 to 1.
Thanks to those who helped take the project to the next level in production through vuln management and infra scenarios: Eryx Paredes, Zoe Longo, Jason Foote, Sergio Franco, Khanh Le Do, Aneesh Agrawal, Leif Raptis-Firth, Hans Wernetti, Fernando Zarate, Kunaal Sikka, Gaston Kleiman, and Lynx Lean.
Thanks to the maintainers and friends of Cartography: Ramon Petgrave, Chandan Chowdhury, Jeremy Chapeau, Marco Lancini, Ryan Lane, Kedar Ghule, Purusottam Mupunu, Daniel D’Agostino, Ashley Lowde, and Daniel Brauer.
Additional thanks to Matt Klein for mentorship in managing an open source project.
Finally, thank you to everyone who has tried out Cartography, raised an issue, shared code in a pull request, provided feedback, or otherwise interacted with the community or project in any way. There have been so many people involved in this journey — thank you for your contributions.
If you think in graphs and not lists, you should apply to work on Lyft’s security team. We’re a small team that absolutely punches above our weight in solving big engineering problems. Visit Lyft Careers to see our openings.
Cartography joins the CNCF was originally published in Lyft Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.
As the legal industry shifted toward digital workflows, a legal tech startup aimed to simplify video testimony management for litigators. The platform aimed to cut costs by up to two-thirds compared to traditional processes, while enabling attorneys to easily record, transcribe, and share video testimony.
To bring this vision to life, 8th Light partnered with the startup to build scalable, user-friendly features tailored to early adopters. Here’s how we helped them succeed.
Litigation teams rely on seamless collaboration, making multi-tenant access a critical feature. To do this, 8th Light helped implement:
Result: Improved early adoption by enabling collaboration, reducing friction for new users, and providing actionable insights for the product team.
The platform’s core value lies in its ability to simplify video testimony workflows. 8th Light helped develop:
Result: Empowered attorneys to handle and share digital testimonies without requiring specialized technical skills, reducing dependency on external vendors.
To support rapid, reliable releases and align with stakeholder needs, we helped implement:
Result: Reduced release risks, improved developer productivity, and streamlined communication between stakeholders.
To prepare for user growth, 8th Light assisted in building a scalable infrastructure using:
Result: Scalable infrastructure that minimized operational costs and empowered stakeholders to make adjustments without developer support.
In just four months, we helped deliver essential features that streamlined digital testimony workflows, empowered attorneys, and positioned the platform for scalable growth. By combining cost-saving infrastructure, intuitive tools, and reliable development workflows, the platform is set to transform litigation processes and capture market share in the legal tech industry.
Ready to create scalable solutions for your business? Partner with 8th Light to design, develop, and deploy legal tech solutions that empower users and drive growth. Contact us today to learn more.
Airbnb had a large presence at the 2024 KDD conference hosted in Barcelona, Spain. Our Data Scientist and Engineers presented on topics like Deep Learning & Search Ranking, Online Experimentation & Measurement, Product Quality & Customer Journey, and Two-sided Marketplaces. This blog post summarizes our contributions to KDD for 2024 and provides access to the academic papers presented during the conference.
Authors: Huiji Gao, Peter Coles, Carolina Barcenas, Sanjeev Katariya
KDD (Knowledge and Data Mining) is one of the most prestigious global conferences in data mining and machine learning. Hosted annually by a special interest group of the Association for Computing Machinery (ACM), it’s where attendees learn about some of the most ground-breaking AI developments in data mining, machine learning, knowledge discovery, and large-scale data analytics.
This year, the 30th KDD conference was held at Barcelona, Spain, attracting thousands of researchers and scientists from academia and industry. Various companies contributed to and attended the conference including Google, Meta, Apple, Amazon, Airbnb, Pinterest, LinkedIn, Booking, Expedia, ByteDance etc. There were 151 Applied Data Science (ADS) track papers and 411 Research track papers accepted, 34 tutorials, and 30 workshops.
Airbnb had a significant presence at KDD 2024 with three full ADS track papers (acceptance rate under 20%), one workshop, and seven workshop papers and invited talks accepted into the main conference proceedings. The topics of our work spanned Deep learning & Search Ranking, Online Experimentation & Measurement, Causal Inference & Machine Learning, and Two-sided Marketplaces.
In this blog post, we will summarize our teams’ contributions and share highlights from an exciting week-long conference with research and industry talks, workshops, panel discussions, and more.
Intelligent search ranking — the process of accurately matching a guest with a listing based on their preference, a listing’s features, and additional search context — still remains a nuanced challenge that researchers are constantly trying to solve.
Making optimal guest-host matches has remained an issue in a two-sided marketplace for a variety of reasons — the timespan of guest searches (ranging between days and weeks), unpredictable host behavior and ratings (the potential for hosts to cancel a booking or receive low ratings), and limited understanding of guest preference across multiple interfaces. We published several papers addressing the issue of search ranking as part of our presence at KDD.
Learning to Rank for Maps at Airbnb
Airbnb brings together hosts who rent listings to prospective guests from around the globe. Results from a guest’s search for listings are displayed primarily through two interfaces: (1) as a list of rectangular cards that contain on them the listing image, price, rating, and other details, referred to as list-results, and (2) as oval pins on a map showing the listing price, called map-results. Both these interfaces, since their inception, have used the same ranking algorithm that orders listings by their booking probabilities and selects the top listings for display.
However, some of the basic assumptions underlying ranking are built for a world where search results are presented as lists and simply break down for map-results. In this work, we rebuilt ranking for maps by revising the mathematical foundations of how users interact with map search results. Our iterative and experiment-driven approach led us through a path full of twists and turns, ending in a unified theory for the two interfaces.
Our journey shows how assumptions taken for granted when designing machine learning algorithms may not apply equally across all user interfaces, and how they can be adapted. The net impact was one of the largest improvements in user experience for Airbnb which we discuss as a series of experimental validations. The work introduced in this paper is merely the beginning of future exciting research projects, such as making learning to rank unbiased for map-results and demarcating the map pins to direct the user attention towards more relevant ones.
Multi-objective Learning to Rank by Model Distillation
In online marketplaces, the objective of search ranking is not only on optimizing purchasing or conversion rate (primary objective), but also the purchase outcomes (secondary objectives), e.g. order cancellation, review rating, customer service inquiries, platform long term growth. To balance these primary and secondary objectives, several multi-objective learning to rank approaches have been widely studied
Traditional approaches in industrial search and recommender systems encounter challenges such as expensive parameter tuning that leads to sub-optimal solutions, suffering from imbalanced data sparsity issues, and lack of compatibility with ad-hoc objectives. In this work, we propose a distillation-based ranking solution for multi-objective ranking, which optimizes the end-to-end ranking system at Airbnb across multiple ranking models on different objectives, along with various considerations to optimize training and serving efficiency that meets industry standards.
Compared with traditional approaches, the proposed solution not only significantly meets and increases the primary objective of conversion by a large margin, but also addresses the secondary objective constraints while improving model stability. Furthermore, we demonstrated the proposed system could be further simplified by model self-distillation. We also did additional simulations to show that this approach could help us efficiently inject ad-hoc non-differentiable business objectives into the ranking system, while enabling us to balance our optimization objectives.
Online experimentation (e.g., A/B testing) is a common way for organizations like Airbnb to make data-driven decisions. But high variance is frequently a challenge. For example, it’s hard to prove that a change in our search UX will drive value because bookings can be infrequent and depend on a large number of interactions over a long period of time.
Metric Decomposition in A/B Tests
More than a decade ago, CUPED (Controlled Experiments Utilizing Pre-Experiment Data) mainstreamed the idea of variance reduction leveraging pre-experiment covariates. Since its introduction, it has been implemented, extended, and modernized by major online experimentation platforms. Despite the wide adoption, it is known by practitioners that the variance reduction rate from CUPED, utilizing pre-experimental data, varies case by case and has a theoretical limit. In theory, CUPED can be extended to augment a treatment effect estimator utilizing in-experiment data, but practical guidance on how to construct such an augmentation is lacking.
In this work, we fill this gap by proposing a new direction for sensitivity improvement via treatment effect augmentation, whereby a target metric of interest is decomposed into
two or more components in an attempt to isolate those with high signal and low noise from those with low signal and high noise. We show through theory, simulation, and empirical examples that if such a decomposition exists (or can be engineered), sensitivity may be increased via approximately null augmentation (in a frequentist setting) and reduced posterior variance (in a Bayesian setting).
We provide three real world applications demonstrating different flavors of metric decomposition. These applications illustrate the gain in agility metric decomposition yields relative to an un-decomposed analysis, indicating both empirically and theoretically the value of this practice in both frequentist and Bayesian settings. An important extension to this work would be to next consider sample size determination in both the frequentist or Bayesian contexts; while a boost in sensitivity typically means less data is required for a given analysis, a methodology that determines the smallest sample size required to control various operating characteristics in this context would be of practical value.
Airbnb employees hosted a workshop on Two-sided Marketplace Optimization: Search, Pricing, Matching & Growth. This workshop brought practitioners of two-sided marketplaces together and discussed the evolution of content ranking, recommendation systems, and data mining when solving for producers and consumers on these platforms.
Two-sided marketplaces have recently emerged as viable business models for many real-world applications. They model transactions as a network with two distinct types of participants: one type to represent the supply and another the demand of a specific good. Traditionally, research related to online marketplaces focused on how to better satisfy demand. But with two-sided marketplaces, there is more nuance at play. Modern global examples, like Airbnb, operate platforms where users provide services; users may be hosts,or guests. Such platforms must develop models that address all their users’ needs and goals at scale. Machine learning-powered methods and algorithms are essential in every aspect of such complex, internet-scale-sized, two-sided marketplaces.
Airbnb is a community based on connection and belonging–we strive to connect people and places. Our contributions to this workshop showcase the work we’re doing to support this mission by optimizing guest experiences, finding equilibrium spots for listing prices, reducing the incidence of poor interactions (and customer support costs as a side effect), detecting when operational staff should follow up on activity at scale, and more.
Guest Intention Modeling for Personalization
Airbnb has transformed the way people travel by offering unique and personalized stays in destinations worldwide. To provide a seamless and tailored experience, understanding user intent plays an important role.
However, limited user data and unpredictable guest behavior can make it difficult to understand the essential intent from guests on listings from hosts. Our work shows how we approach this challenging problem. We describe how we apply a deep learning approach to predict difficult-to-infer details for a user’s travel plan, such as the next destination and travel dates. The framework analyzes high-level information from users’ in-app browsing history, booking history, search queries, and other engagement signals, and produces multiple user intent signals.
Marketing emails, flexible travel search (e.g., for “Europe in the summer”), and recommendations on the app home page are three guest interactions that benefit from correct intention modeling. Hosts also benefit, since a clear understanding of guest demand can help them optimize listings to increase satisfaction and bookings.
Hosts can find it difficult to correctly price their listings in two-sided marketplaces serviced by end users. Most hosts are not professional hospitality workers, and would benefit from access to data and advice on how guests see their listings and how they compare to other listings in their neighborhood. We constantly look for ways to give guidance on how hosts can optimally price their listings. The same information can then be used to help guests find their ideal stay.
In our paper, we presented an example of how this problem can be solved in general.
As illustrated above, both demand and supply change over time, influencing the equilibrium price for a property at a specific point. A historical optimum (such as A above) has to be adjusted to find the current optimum (point C). It is difficult to run experiments since any large-scale experiment we might run will cause the environment to change in complex ways. We tackle this problem by combining economic modeling with causal inference techniques. We segment guests and estimate how price-sensitive each guest segment is, and fine-tune them with empirical data from small targeted experiments and larger-scale natural ones, which are used to adjust estimates for the price sensitivity of each guest segment. Hosts can then use the models’ output to make informed tradeoffs between higher occupancy and higher nightly rates.
Listing Embedding for Host-side Products
In order to facilitate the matching of listings and guests, Airbnb provides numerous products and services to both hosts and guests. Many of these tools are based on the ability to compare listings, i.e. finding similar listings or listings that may be viewed as equivalent substitutes. Our work presents a study on the application and learning of listing embeddings in Airbnb’s two-sided marketplace. Specifically, we discuss the architecture and training of a neural network embedding model using guest side engagement data, which is then applied to host-side product surfaces. We address the key technical challenges we encountered, including the formulation of negative training examples, correction of training data sampling bias, and the scaling and speeding up training with the help of in-model caching. Additionally, we discuss our comprehensive approach to evaluation, which ranges from in-batch metrics and vocabulary-based evaluation to the properties of similar listings. Finally, we share our insights from utilizing listing embeddings in Airbnb products, such as host calendar similar listings.
Customer Support Optimization in Search Ranking
As of the date of the paper, Airbnb had more than 7.7 million listings from more than 5 million hosts worldwide. Airbnb is investing both in rapid growth and in making sure that the booking experience is pleasant for hosts and guests. It would, however, be ideal to avoid poor experiences in the first place. Our work highlights how we prevent poor experiences without significantly reducing growth.
We use the mass of accumulated support data at Airbnb to model the probability that, if the current user were to book a listing, they would require CS support. Our model discovered multiple features about the searcher, home, and hosts that accurately predict CS requirements. For example, same-day bookings tend to require more support, and a responsive host tends to reduce support needs. So, if a guest chooses a same-day booking, matching them with a highly responsive host can lead to a better experience overall. We incorporate the output of our CS support model in search result rankings; booked homes will sometimes rank lower if we predict a booking will lead to a negative experience.
LLM Pretraining using Activity Logs
It’s often important to follow up with users after they’ve had a long series of interactions with a two-sided marketplace to help make sure that their experiences are of high quality. When user interactions meet certain business criteria, operations agents create tickets to follow up with them. For example, user retention and reactivation agents might review user activity logs and decide to follow up with the user, to encourage them to re-engage with the platform.
We propose transforming structured data (activity logs) into a more manageable text format and then leveraging modern language models (i.e., BERT) to pretrain a large language model based on user activities. We then performed fine-tuning on the model using historical data about which users were followed up with and checked its predictions. Our work demonstrates the large language model trained on pre-processed activity can successfully identify when a user should be followed up with, at an experimentally significant rate. Our preliminary results suggest that our framework may outperform by 80% the average precision of a similar model that was designed relying heavily on feature engineering.
Typically, product quality is evaluated based on structured data. Customer ratings, types of support issues, resolution times, and other factors are used as a proxy for how someone booking on Airbnb might value a listing. This kind of data has limitations — more popular listings have more data, often users don’t leave feedback, and feedback is usually biased towards the positive (users with negative experiences tend to churn and not give feedback).
In the Workshop on Causal Inference and Machine Learning in Practice, we highlighted an example of how we push the boundaries of product quality assessment techniques and applications, mixing traditional casual inference with cutting-edge machine learning research. In our work “Understanding Product Quality with Unstructured Data: An Application of LLMs and Embeddings at Airbnb”, we presented how an approach based on text embeddings and LLMs can be combined with approaches based on structured data to significantly improve product quality evaluations. We generate text embeddings on a mix of listing and review texts, then cluster the embeddings based on rebooking and churn rates. Once we have clear clusters, we extract keywords from the original data, and use these keywords to calculate a listing quality score, based on their similarity to the keyword list.
In addition, we were invited to give a talk on Quality Foundations at Airbnb, at KDD’s 3rd Workshop on End-End Customer Journey Optimization. It’s often hard to differentiate the quality of customer experiences using simple review ratings, in part due to the tightness of their distribution. In this talk, we present an alternative notion of quality based on customer revealed preference: did a customer return to use the platform again after their experience? We describe how a metric — Guest Return Propensity (GRP) — leverages this concept and can differentiate quality, capture platform externalities, and predict future returns.
In practice, this measure may not be suited to many common business use cases due to its lagging nature and an inability to easily explain why it has changed. We describe a quality measurement system that builds on the conceptual foundation of GRP by modeling it as an outcome of upstream realized quality signals. These signals — from sources like reviews and customer support — are weighted by their impact on return propensity and mapped to a quality taxonomy to aid in explainability. The resulting score is capable of finely differentiating the quality of customer experiences, aiding tradeoff decisions, and providing timely insights.
The 2024 edition of KDD was an amazing opportunity for data scientists and machine learning engineers from across the globe and industry, government, and academia, to connect and exchange learnings and discoveries. We were honored to have the opportunity to share some of our knowledge and techniques, generalizing what we have been learning when we apply machine learning to problems we see at Airbnb. We continue to focus on improving our customers’ experience and growing our business, and the information we’ve shared has been crucial to our success. We’re excited to continue learning from peers and contribute our work back to our community. We eagerly await advancements and improvements that might come about as others build upon the work we’ve shared.
Below, you’ll find a complete list of the talks and papers shared in this article along with the team members who contributed. If this type of work interests you, we encourage you to apply for an open position today.
Learning to Rank for Maps at Airbnb (link)
Authors: Malay Haldar, Hongwei Zhang, Kedar Bellare, Sherry Chen, Soumyadip Banerjee, Xiaotang Wang, Mustafa Abdool, Huiji Gao, Pavan Tapadia, Liwei He, Sanjeev Katariya
Multi-objective Learning to Rank by Model Distillation (link)
Authors: Jie Tang, Huiji Gao, Liwei He, Sanjeev Katariya
Metric Decomposition in A/B Tests (link)
Authors: Alex Deng (former employee at Airbnb), Luke Hagar (University of Waterloo), Nathaniel T. Stevens (University of Waterloo), Tatiana Xifara (Airbnb), Amit Gandhi (University of Pennsylvania)
Understanding Guest Preferences and Optimizing Two-sided Marketplaces: Airbnb as an Example (link)
Authors: Yufei Wu, Daniel Schmierer
Predicting Potential Customer Support Needs and Optimizing Search Ranking in a Two-Sided Marketplace (link)
Authors: Do-kyum Kim, Han Zhao, Huiji Gao, Liwei He, Malay Haldar, Sanjeev Katariya
Understanding User Booking Intent at Airbnb (link)
Authors: Xiaowei Liu, Weiwei Guo, Jie Tang, Sherry Chen, Huiji Gao, Liwei He, Pavan Tapadia, Sanjeev Katariya
Can Language Models Accelerate Prototyping for Non-Language Data? Classification & Summarization of Activity Logs as Text (link)
Authors: José González-Brenes
Learning and Applying Airbnb Listing Embeddings in Two-Sided Marketplace (link)
Authors: Siarhei Bykau, Dekun Zou
Understanding Product Quality with Unstructured Data: An Application of LLMs and Embeddings at Airbnb (link)
Authors: Jikun Zhu, Zhiying Gu, Brad Li, Linsha Chen
Invited Talk: Quality Foundations at Airbnb
Speakers: Peter Coles, Mike Egesdal
Airbnb at KDD 2024 was originally published in The Airbnb Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
This article was authored by Goli Cobbe, Danny Godbout, Mani Najmabadi, Ryan Hillman, and Scott Noblit
Few topics ignite excitement quite like planning a trip to a new destination where travelers can explore new sights, savor delicious food, and create lasting memories with friends and family. The travel industry presents a unique opportunity for AI applications as it addresses both the thrill and anxieties associated with exploring new places. Generative AI (genAI) has emerged as the latest buzzword in the AI field, offering innovative possibilities. While our teams at Expedia Group™ are eager to harness genAI’s potential, it can be expensive to deploy, has a higher computation time, and runs the risk of hallucinations. There are plenty of experiences where traditional AI can better achieve the desired business and user goals.
Generative AI models excel at content creation and reasoning by analyzing patterns from extensive data sets, with common applications such as chatbots and text summarization. Its creative capabilities make it a powerful tool, and the breadth of the training data makes genAI models broadly useful. Conversely, traditional AI is more established in consumer products, focusing on analyzing patterns and making predictions based on structured, domain-specific data. Solutions using traditional AI are generally easier to scale and more reliable, with prevalent use cases including recommendation engines, predictive models and personalization.
Generative AI use cases include:
Traditional predictive AI use cases include:
1. Identify the problem type you are trying to solve
The first step is to clearly define the problem you’re addressing. We always start with the traveler and think about what problems we are trying to solve and what makes sense for the experience. Start by considering the specific issues you’re aiming to solve. Are you prioritizing creativity and inspiration, or operational accuracy and efficiency? Large Language Models (LLMs) have a higher risk for inaccuracies, making traditional AI a safer choice for applications requiring high precision.
2. Evaluate available data
The type and structure of your data should heavily influence your choice. Traditional AI thrives on structured data, whereas genAI can be deployed with minimal or unstructured data. GenAI and LLMs are particularly well-suited for large volumes of unstructured data.
3. Determine how quickly you need to deliver
GenAI and foundation models are generally more adaptable and can lead to faster time-to-market. However, this comes at the cost of potentially higher long-term expenses, longer computation times, and the risk of lower output quality if guardrails are not implemented. Traditional AI models, while slower to adjust to new data, may offer more predictable and optimized performance over time.
4. Performance measurement
We establish performance and quality measures against which to evaluate to what extent the solution addresses the problem. In the early stages of an AI product, we evaluate quality metrics such as tone of voice, customer delight, usability, relevance, engagement, and latency. Be open to investing in a more costly solution if it better addresses the unique aspects of your problem — particularly if innovation is a key driver.
5. Evaluate costs
Examine not only the inference cost for a particular solution but also the overall cost-benefit tradeoff in relation to key business metrics. Consider non-AI solutions, such as off-the-shelf products or internal APIs, that may provide similar capabilities.
6. Determine how scalable the solution is
As user adoption increases, understanding the scalability of your solution is essential. Consider factors like latency and data availability, ensuring you forecast these needs to prevent scaling issues.
The sections below offer guidance and examples of when to choose traditional AI, genAI, or a hybrid approach.
A common use of traditional AI is in searching for relevance and search optimizations. We use Machine Learning to tailor search results delivering personalized and more efficient results. Predictive models trained on previous activity also help rank search results by anticipating which results are most likely to meet a user’s needs and are consequently ranked higher.
Traditional AI can also be used for semantic matching to decipher the underlying meaning of certain words or phrases. For example we use a combination of semantic matching ML models with review and image embeddings to display results for unique searches such as “Hotel on a golf course.”
Another type of traditional AI, propensity models, can predict the likelihood that a certain event will drive behavior for a specific user and that information can be used to drive engagement. For example, propensity models help us to determine when we should use promotion targeting to increase the likelihood of a traveler making a reservation on our site.
GenAI is ideal when you need to distill large amounts of information.
One application of this for Expedia Group is summarizing content from traveler reviews. In this case, the LLM distills large volumes of customer reviews into insights and highlights. Instead of having to read through many reviews, users can see a summarized version that captures key themes and sentiment.
In another application of genAI, we use reviews and property information to allow travelers to get instant answers to travel questions with AI-powered search.
GenAI can also be used to enhance customer experiences, operational efficiency, and provide valuable insights by leveraging an AI assistant.
Romie is Expedia’s automated travel assistant, powered by genAI, and is the culmination of Expedia’s travel intelligence, personified — a collection of assistive moments across the discovery, shopping and trips experiences. Below, the AI contextually delegates to different tools/APIs to handle a user query. On-the-fly reasoning and delegation helps travelers hone in on their preferences.
Combining the strengths of both genAI and traditional AI can create a truly remarkable traveler experience, and personalization is a great example! In the example below, Expedia gathers restaurant preference information from the user, combines it with existing data, and creates prompts for the LLM to find authentic local cuisine options near a traveler’s hotel. Combining the predictive data from traditional AI with the power of genAI content creation here results in a more personalized experience.
The choice between generative AI and traditional AI depends on your business and user experience goals, and both options should be evaluated in making product decisions. While generative AI excels in creating dynamic content and enhancing customer interactions through applications like chatbots, traditional AI offers robust and lower cost solutions for many use cases. By evaluating factors such as data availability, data structure, delivery timelines, and cost, you can make informed decisions on which AI approach to adopt. Embracing a hybrid strategy that combines the strengths of both generative and traditional AI can lead to more personalized and efficient experiences. While generative AI is exciting and enables creativity, traditional AI remains a great choice for many applications.
Elevating Travel Experiences with AI was originally published in Expedia Group Technology on Medium, where people are continuing the conversation by highlighting and responding to this story.
This article is the first in a multi-part series sharing a breadth of Analytics Engineering work at Netflix, recently presented as part of our annual internal Analytics Engineering conference. We kick off with a few topics focused on how we’re empowering Netflix to efficiently produce and effectively deliver high quality, actionable analytic insights across the company. Subsequent posts will detail examples of exciting analytic engineering domain applications and aspects of the technical craft.
At Netflix, we seek to entertain the world by ensuring our members find the shows and movies that will thrill them. Analytics at Netflix powers everything from understanding what content will excite and bring members back for more to how we should produce and distribute a content slate that maximizes member joy. Analytics Engineers deliver these insights by establishing deep business and product partnerships; translating business challenges into solutions that unblock critical decisions; and designing, building, and maintaining end-to-end analytical systems.
Each year, we bring the Analytics Engineering community together for an Analytics Summit — a 3-day internal conference to share analytical deliverables across Netflix, discuss analytic practice, and build relationships within the community. We covered a broad array of exciting topics and wanted to spotlight a few to give you a taste of what we’re working on across Analytics Engineering at Netflix!
At Netflix, like in many organizations, creating and using metrics is often more complex than it should be. Metric definitions are often scattered across various databases, documentation sites, and code repositories, making it difficult for analysts and data scientists to find reliable information quickly. This fragmentation leads to inconsistencies and wastes valuable time as teams end up reinventing metrics or seeking clarification on definitions that should be standardized and readily accessible.
Enter DataJunction (DJ). DJ acts as a central store where metric definitions can live and evolve. Once a metric owner has registered a metric into DJ, metric consumers throughout the organization can apply that same metric definition to a set of filtered records and aggregate to any dimensional grain.
As an example, imagine an analyst wanting to create a “Total Streaming Hours” metric. To add this metric to DJ, they need to provide two pieces of information:
SELECT
account_id, country_iso_code, streaming_hours
FROM streaming_fact_table
`SUM(streaming_hours)`
Then metric consumers throughout the organization can call DJ to request either the SQL or the resulting data. For example,
dj.sql(metrics=[“total_streaming_hours”], dimensions=[“account_id”]))
dj.sql(metrics=[“total_streaming_hours”], dimensions=[“country_iso_code”]))
dj.sql(metrics=[“total_streaming_hours”], dimensions=[“country_iso_code”], filters=[“country_iso_code = ‘US’”]))
The key here is that DJ can perform the dimensional join on users’ behalf. If country_iso_code doesn’t already exist in the fact table, the metric owner only needs to tell DJ that account_id is the foreign key to an `users_dimension_table` (we call this process “dimension linking”). DJ then can perform the joins to bring in any requested dimensions from `users_dimension_table`.
The Netflix Experimentation Platform heavily leverages this feature today by treating cell assignment as just another dimension that it asks DJ to bring in. For example, to compare the average streaming hours in cell A vs cell B, the Experimentation Platform relies on DJ to bring in “cell_assignment” as a user’s dimension (no different from country_iso_code). A metric can therefore be defined once in DJ and be made available across analytics dashboards and experimentation analysis.
DJ has a strong pedigree–there are several prior semantic layers in the industry (e.g. Minerva at Airbnb; dbt Transform, Looker, and AtScale as paid solutions). DJ stands out as an open source solution that is actively developed and stress-tested at Netflix. We’d love to see DJ easing your metric creation and consumption pain points!
At Netflix, we rely on data and analytics to inform critical business decisions. Over time, this has resulted in large numbers of dashboard products. While such analytics products are tremendously useful, we noticed a few trends:
Analytics Enablement is a collection of initiatives across Data & Insights all focused on empowering Netflix analytic practitioners to efficiently produce and effectively deliver high-quality, actionable insights.
Specifically, these initiatives are focused on enabling analytics rather than on the activities that produce analytics (e.g., dashboarding, analysis, research, etc.).
As part of broad analytics enablement across all business domains, we invested in a chatbot to provide real insights to our end users using the power of LLM. One reason LLMs are well suited for such problems is that they tie the versatility of natural language with the power of data query to enable our business users to query data that would otherwise require sophisticated knowledge of underlying data models.
Besides providing the end user with an instant answer in a preferred data visualization, LORE instantly learns from the user’s feedback. This allows us to teach LLM a context-rich understanding of internal business metrics that were previously locked in custom code for each of the dashboard products.
Some of the challenges we run into:
Democratizing analytics can unlock the tremendous potential of data for everyone within the company. With Analytics enablement and LORE, we’ve enabled our business users to truly have a conversation with the data.
At Netflix, we use Amazon Web Services (AWS) for our cloud infrastructure needs, such as compute, storage, and networking to build and run the streaming platform that we love. Our ecosystem enables engineering teams to run applications and services at scale, utilizing a mix of open-source and proprietary solutions. In order to understand how efficiently we operate in this diverse technological landscape, the Data & Insights organization partners closely with our engineering teams to share key efficiency metrics, empowering internal stakeholders to make informed business decisions.
This is where our team, Platform DSE (Data Science Engineering), comes in to enable our engineering partners to understand what resources they’re using, how effectively they utilize those resources, and the cost associated with their resource usage. By creating curated datasets and democratizing access via a custom insights app and various integration points, downstream users can gain granular insights essential for making data-driven, cost-effective decisions for the business.
To address the numerous analytic needs in a scalable way, we’ve developed a two-component solution:
As the source of truth for efficiency metrics, our team’s tenants are to provide accurate, reliable, and accessible data, comprehensive documentation to navigate the complexity of the efficiency space, and well-defined Service Level Agreements (SLAs) to set expectations with downstream consumers during delays, outages, or changes.
Looking ahead, we aim to continue onboarding platforms, striving for nearly complete cost insight coverage. We’re also exploring new use cases, such as tailored reports for platforms, predictive analytics for optimizing usage and detecting anomalies in cost, and a root cause analysis tool using LLMs.
Ultimately, our goal is to enable our engineering organization to make efficiency-conscious decisions when building and maintaining the myriad of services that allows us to enjoy Netflix as a streaming service. For more detail on our modeling approach and principles, check out this post!
Analytics Engineering is a key contributor to building our deep data culture at Netflix, and we are proud to have a large group of stunning colleagues that are not only applying but advancing our analytical capabilities at Netflix. The 2024 Analytics Summit continued to be a wonderful way to give visibility to one another on work across business verticals, celebrate our collective impact, and highlight what’s to come in analytics practice at Netflix.
To learn more, follow the Netflix Research Site, and if you are also interested in entertaining the world, have a look at our open roles!
Part 1: A Survey of Analytics Engineering Work at Netflix was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.