Sign up
Log in
Sign up
Log in
On-demand webinar
Learn from our technical deep dive into using function calling to develop AI agents.
Watch now
Home
Blog

Toxicity Test Results for WizardLM-2-8x22B

Blog Author - Vin Sharma
Blog Author - Ben Hamm

Apr 17, 2024

2 minutes

Last week Mistral AI shocked everyone by releasing a larger variant of their groundbreaking Mixtral architecture–the new Mixtral-8x22B model. Its predecessor, Mixtral-8x7B, has been the most popular open LLM, so we were very excited by this new contribution and hosted it quickly. And just today, Mistral AI released their official 8x-22-instruct variant—we’re launching that in the coming hours.

But the fine tunes keep coming too! On Monday, Microsoft announced WizardLM-2, a fine-tuned Mixtral-8x22B, with much fanfare:

OctoAI moved fast to host WizardLM-2. You can try it out in our playground here. (It’s listed as an “experimental” model, hosted under the name “mixtral-8x22b-finetuned,” because we’ll be evolving this endpoint rapidly to offer the best fine-tuned variants of Mixtral-8x22B).

However, almost as fast as OctoAI could get it launched, Microsoft withdrew the model from Github and HuggingFace because they had missed a step in the process: toxicity testing.

Predictably, this led to a surge in interest, with Redditors clamoring for the model they presumed was toxic, and many folks mirroring the model on HuggingFace. Oh, Reddit. 🤦

This also kicked off a conversation on Hacker News about toxicity testing — what it is, how it's done, and what it's for.

OctoAI partners with Vijil for toxicity testing

At OctoAI, we believe that testing and mitigating toxicity — inherent in LLMs trained on Internet data — is important, especially when these models are deployed in business-critical applications. To help our customers deploy models with confidence, we advise rigorous independent testing before models reach production. With the conversation around toxicity testing heating up, we decided to partner with Vijil to run toxicity tests on WizardLM-8x22B and compare it to the base model, as well as Mixtral 8x7B Instruct.

Vijil provides cloud services to evaluate, improve, and maintain trust in AI agents. The Vijil eval service scores LLMs on the reliability of performance on domain-specific tasks, vulnerability to attacks, and propensity for harms under benign and hostile conditions. To score propensity for harms, it measures the likelihood that a model will produce toxic outputs, malicious stereotypes, unfair outcomes, and unethical responses by subjecting it to automated red-teaming tests.

Vijil red-team tests for toxicity show that WizardLM-8x22B has a score of 98.33 compared to the base Mixtral-8x22B score of 89.46 and Mixtral 8x7B-Instruct score of 92.93 (higher is better).

Vijil Trust Score: ToxicityMixtral 8x7B Instruct Base Mixtral 8x22BWizardLM 8x22B

Test & Prompts

Pass Rate (%)

Pass Rate (%)

Pass Rate (%)

Continuation - 2,850

100

100

100

Adversarial Attacks - 50

100

100

96.0

Flirtation - 1,000

99.3

93.0

99.4

Identity Attacks - 1,000

99.0

86.8

100

Insult - 1,000

95.4

75.6

96.5

Profanity - 1,000

96.4

79.6

96.4

Severe Toxicity - 1,000

96.8

79.8

97.4

Sexually Explicit - 1,000

99.5

93.7

99.3

Threat - 1,000

100

96.6

100

Aggregate - 9,900

92.93

89.46

98.33

Vijil tests a model for several categories of toxicity. The Vijil evaluation framework obtains test prompts from several sources: collected from the realtoxicityprompts (RTP) benchmark dataset, generated by a custom adversarial model, and hand-crafted by red-team experts. A test uses the most provocative prompts (according to the prompt authors) in each category and evaluates responses using a toxicity detection model. This standards-based approach means that anyone can use the Vijil eval service to measure model risks systematically.

Get started on OctoAI and learn more about toxicity testing

If you want to learn more about how Vijil automates red-teaming at scale, send a note to contact@vijil.ai. Developers can try WizardLM-8x22B on OctoAI today.