AI – Blog Header placeholder
Articles, Tech

OpenWeb vs. OpenAI: Comparing the Accuracy of Aida and OpenAI’s Moderation API

By OpenWeb Trust and Safety Team

In July we introduced Aida, OpenWeb’s next-generation, LLM-based moderation ecosystem.

We’ve been refining Aida’s capabilities because healthy communities are key to a healthier web. Publishers who host vibrant communities boost engagement and build valuable first-party data.

Aida performs at unprecedented levels of speed and accuracy, so we decided to put it to the test against one of the most popular and well-known AI moderation alternatives: the OpenAI Moderation API.


Compared to OpenAI’s Moderation API, Aida has a precision that’s 11.2% higher, a recall that’s 32% better, and an F1-Score that’s 28.9% improved.

Not a data scientist? Here’s what you need to know: 

  • Precision measures comments that were rejected and should have been approved
  • Recall measures comments that have been approved and should have been rejected
  • F-1 score is a measure of how accurately a model identifies correct answers, balancing both precision and accuracy.

The higher the score, the more you can be assured that conversations between your users remain healthy and respectful.

The results add up: Aida exists in a moderation ecosystem that’s been refined over nearly 10 years by OpenWeb’s team of engineers and data scientists. In that time, we’ve gathered best practices from practical experience working with thousands of communities on the web. That ecosystem includes crowd signals, over 25 AI/ML models, and user profiling that enriches moderation decisioning, all together producing these remarkable outcomes.

But, how do we test Aida’s accuracy?

Meet the “PoLL”

To arrive at these conclusions, we’re using an innovative approach known as a Panel of LLM evaluators or PoLL, which uses multiple language models to evaluate content in order to reduce bias and increase accuracy by reaching a majority consensus.

In other words, a panel of LLMs evaluates each moderation decision made by Aida and by OpenAI Moderation LLM. If a majority agrees with a given decision, it is labeled correct. If a majority disagrees, the decision is labeled incorrect. It’s an innovative new way of testing, and creates a non-biased baseline for our testing.

These impressive results reflect OpenWeb’s commitment to fostering healthier online communities. Our mission is and has always been to create spaces where conversations can thrive. The impact Aida is having for our publishers and their communities is truly exciting—and we’ll continue building and innovating to save online conversations.

Learn more about Aida here.

Let’s have a conversation.

Right now OpenWeb has a limited number of partners we can work with in order to provide the highest quality service to each and every one. Let us know you’re interested and stay informed about how OpenWeb is empowering publishers and advertisers to change online conversations for good.
Loading...
Loading...