OpenWeb – Filter Definitions (DM)
The below table shows OpenWeb’s current breakdown of filters available within the “Settings>Policy>Automated Moderation Filters” section, accompanied with the model definitions.
*Other filters are not currently listed.
Model | Definition |
Attack on Author | Attacks on the author of an article or a specific post. Flags comments critical of editorial teams. |
Attack on Commenter | Attacks on a fellow commenter, including comments that aim to harass, degrade, or intimidate others. Filter flags only replies. |
Flirtation | Pickup lines, compliments on appearance, subtle sexual innuendos, or undertones. |
Hate Speech | Attacks on individuals based on characteristics, restricting conversations, fostering controversy, and impeding healthy dialogue, addressing broader issues including attacks on the LGBTQ+ community. |
Hebrew Low Quality | Harmful or unwanted comments in the Hebrew language. |
Hostility | Flags comments that exhibit harmful or undesirable comment characteristics. |
Identity Attack | Negative or hateful comments targeting someone based on their identity, such as race, gender, religion, creed, etc. |
Inappropriate Emoji | Identifies inappropriate or harmful comments containing emojis. It detects irrelevant, excessive, or disruptive emojis, as well as combinations that create unintended or inappropriate meanings. |
Incivility | Takes action on user-generated content (UGC) based on a grouping of various model scores. |
Incoherent | Comments that are difficult to understand, illegible, or nonsensical. |
Inflammatory | Comments intending to provoke, inflame, or incite a negative reaction. |
Insult | An inflammatory or excessively negative comment directed towards an individual or group. |
Likely to Reject | Overall measure of the likelihood for the comment to be rejected based on the previous user-generated content (UGC) of a similar nature. |
Low Quality | Negative or trivial comments lacking insight, constructive argument, or that may be off-topic to the conversation. |
Obscene | Offensive language, including slurs, cursing, etc. |
Political Misinformation | Identifies and flags comments that are similar to known political misinformation or fake news narratives. This model helps to continuously reduce the spread of false information by detecting comments that align with previously identified political misinformation narratives. |
Potential Spam | Unwanted content, categorizing instances related to finance, politics, and commerce. |
Profanity | Sweary words, curse words, or other obscene and vulgar language. |
Racism | Flags discrimination and prejudice against people based on their race or ethnicity. |
Severe Toxicity | A highly hateful, aggressive, or disrespectful comment, likely to prompt a user to leave the conversation. The flag is less sensitive to milder forms of toxicity, such as reactionary cursing. |
Sexually Explicit | Includes references to sexual acts, fluids, body parts, lewd, or otherwise NSFW content. |
Spam Similarity | Identifies and flags comments closely resembling previously detected spam. |
Spanish Low Quality | Predicts whether a Spanish comment should be blocked based on criteria such as obscenity, toxicity, hate speech, or spam promotion. |
Suspected Bot | Suspends users detected as bots from future commenting. |
Suspected Spam | Irrelevant and unsolicited commercial content, phishing, external linking, scamming, etc. |
Threat | Describes an intention to inflict pain, injury, or violence on an individual or group. |
Toxicity | A rude, disrespectful, unreasonable, or charged comment likely to prompt others to leave a discussion. |
Unsubstantial | Comments that are trivial or brief. |
Vulgarity | Inappropriate comments where the text is obscured in an attempt to evade moderation filters. |
Note: The list above is subject to change.