Toxicity Test Results for WizardLM-2-8x22B model on OctoAI

Last week Mistral AI shocked everyone by releasing a larger variant of their groundbreaking Mixtral architecture–the new Mixtral-8x22B model. Its predecessor, Mixtral-8x7B, has been the most popular open LLM, so we were very excited by this new contribution and hosted it quickly. And just today, Mistral AI released their official 8x-22-instruct variant—we’re launching that in the coming hours.

But the fine tunes keep coming too! On Monday, Microsoft announced WizardLM-2, a fine-tuned Mixtral-8x22B, with much fanfare:

Microsoft announce the launch of WizardLM-2, a fine-tune of Mixtral 8x22B on twitter

OctoAI moved fast to host WizardLM-2. You can try it out in our playground here. (It’s listed as an “experimental” model, hosted under the name “mixtral-8x22b-finetuned,” because we’ll be evolving this endpoint rapidly to offer the best fine-tuned variants of Mixtral-8x22B).

However, almost as fast as OctoAI could get it launched, Microsoft withdrew the model from Github and HuggingFace because they had missed a step in the process: toxicity testing.

a Twitter post by Microsoft apologizing for removing WizardLM-2 due to lack of toxicity testing

Predictably, this led to a surge in interest, with Redditors clamoring for the model they presumed was toxic, and many folks mirroring the model on HuggingFace. Oh, Reddit. 🤦

A reddit poster, redditfriendguy, asking jokingly for where to download toxic WizardLM-2 model

This also kicked off a conversation on Hacker News about toxicity testing — what it is, how it's done, and what it's for.

OctoAI partners with Vijil for toxicity testing

At OctoAI, we believe that testing and mitigating toxicity — inherent in LLMs trained on Internet data — is important, especially when these models are deployed in business-critical applications. To help our customers deploy models with confidence, we advise rigorous independent testing before models reach production. With the conversation around toxicity testing heating up, we decided to partner with Vijil to run toxicity tests on WizardLM-8x22B and compare it to the base model, as well as Mixtral 8x7B Instruct.

Vijil provides cloud services to evaluate, improve, and maintain trust in AI agents. The Vijil eval service scores LLMs on the reliability of performance on domain-specific tasks, vulnerability to attacks, and propensity for harms under benign and hostile conditions. To score propensity for harms, it measures the likelihood that a model will produce toxic outputs, malicious stereotypes, unfair outcomes, and unethical responses by subjecting it to automated red-teaming tests.

Vijil red-team tests for toxicity show that WizardLM-8x22B has a score of 98.33 compared to the base Mixtral-8x22B score of 89.46 and Mixtral 8x7B-Instruct score of 92.93 (higher is better).

Vijil Trust Score: Toxicity	Mixtral 8x7B Instruct	Base Mixtral 8x22B	WizardLM 8x22B
Test & Prompts	Pass Rate (%)	Pass Rate (%)	Pass Rate (%)
Continuation - 2,850	100	100	100
Adversarial Attacks - 50	100	100	96.0
Flirtation - 1,000	99.3	93.0	99.4
Identity Attacks - 1,000	99.0	86.8	100
Insult - 1,000	95.4	75.6	96.5
Profanity - 1,000	96.4	79.6	96.4
Severe Toxicity - 1,000	96.8	79.8	97.4
Sexually Explicit - 1,000	99.5	93.7	99.3
Threat - 1,000	100	96.6	100
Aggregate - 9,900	92.93	89.46	98.33

Vijil tests a model for several categories of toxicity. The Vijil evaluation framework obtains test prompts from several sources: collected from the realtoxicityprompts (RTP) benchmark dataset, generated by a custom adversarial model, and hand-crafted by red-team experts. A test uses the most provocative prompts (according to the prompt authors) in each category and evaluates responses using a toxicity detection model. This standards-based approach means that anyone can use the Vijil eval service to measure model risks systematically.

Get started on OctoAI and learn more about toxicity testing

If you want to learn more about how Vijil automates red-teaming at scale, send a note to contact@vijil.ai. Developers can try WizardLM-8x22B on OctoAI today.