OpenWeb – Filter Definitions (DM)

The below table shows OpenWeb’s current breakdown of filters available within the “Settings>Policy>Automated Moderation Filters” section, accompanied with the model definitions.

*Other filters are not currently listed.

ModelDefinition
Attack on Author Attacks on the author of an article or a specific post. Flags comments critical of editorial teams.
Attack on CommenterAttacks on a fellow commenter, including comments that aim to harass, degrade, or intimidate others. Filter flags only replies.
FlirtationPickup lines, compliments on appearance, subtle sexual innuendos, or undertones.
Hate SpeechAttacks on individuals based on characteristics, restricting conversations, fostering controversy, and impeding healthy dialogue, addressing broader issues including attacks on the LGBTQ+ community.
Hebrew Low QualityHarmful or unwanted comments in the Hebrew language.
HostilityFlags comments that exhibit harmful or undesirable  comment characteristics.
Identity AttackNegative or hateful comments targeting someone based on their identity, such as race, gender, religion, creed, etc.
Inappropriate EmojiIdentifies inappropriate or harmful comments containing emojis. It detects irrelevant, excessive, or disruptive emojis, as well as combinations that create unintended or inappropriate meanings.
IncivilityTakes action on user-generated content (UGC) based on a grouping of various model scores.
IncoherentComments that are difficult to understand, illegible, or nonsensical.
InflammatoryComments intending to provoke, inflame, or incite a negative reaction.
InsultAn inflammatory or excessively negative comment directed towards an individual or group.
Likely to RejectOverall measure of the likelihood for the comment to be rejected based on the previous user-generated content (UGC) of a similar nature.
Low QualityNegative or trivial comments lacking insight, constructive argument, or that may be off-topic to the conversation.
ObsceneOffensive language, including slurs, cursing, etc.
Political MisinformationIdentifies and flags comments that are similar to known political misinformation or fake news narratives. This model helps to continuously reduce the spread of false information by detecting comments that align with previously identified political misinformation narratives.
Potential SpamUnwanted content, categorizing instances related to finance, politics, and commerce.
ProfanitySweary words, curse words, or other obscene and vulgar language.
RacismFlags discrimination and prejudice against people based on their race or ethnicity.
Severe ToxicityA highly hateful, aggressive, or disrespectful comment, likely to prompt a user to leave the conversation. The flag is less sensitive to milder forms of toxicity, such as reactionary cursing.
Sexually ExplicitIncludes references to sexual acts, fluids, body parts, lewd, or otherwise NSFW content.
Spam SimilarityIdentifies and flags comments closely resembling previously detected spam.
Spanish Low QualityPredicts whether a Spanish comment should be blocked based on criteria such as obscenity, toxicity, hate speech, or spam promotion.
Suspected BotSuspends users detected as bots from future commenting.
Suspected SpamIrrelevant and unsolicited commercial content, phishing, external linking, scamming, etc.
ThreatDescribes an intention to inflict pain, injury, or violence on an individual or group.
ToxicityA rude, disrespectful, unreasonable, or charged comment likely to prompt others to leave a discussion.
UnsubstantialComments that are trivial or brief.
VulgarityInappropriate comments where the text is obscured in an attempt to evade moderation filters.

Note: The list above is subject to change.