Even fewer can detect the variety of abuse a single model is trained on, simply due to the combinatorial messiness that abusive content represents. State-of-the-art natural language processing (NLP) models with advanced AI can be accurate up to 90–95% in identifying written explicit straightforward abuses, but will tend to fail at covert type of toxicity, harassment or coded languages. This means that approximately 5-10% of harmful content may remain undetected, especially when abusers use forms of language that are subtle or a speak in terms specific to their cultural setting.
But the problem with detecting abuse comes in, when you take into account that AI needs to be trained on a wide an assorted data set to learn and understand many different kinds of abusive behaviour. Acquiring high-quality training data to cover diverse types of abuse like harassment, hate speech and manipulative contents is very expensive – with costs for dataset gathering (data acquisition)and cleaning upment( labeling )accounting for almost 60%of the AI model training budget. Abuse forms like sarcasm or context dependent meaning are complex and require specialized datasets which make their generation expensive both in terms of money, as well time.
Adding to this, support for multiple languages incurs further complexity. To effectively find abuse across languages, the AI has to know cultural slangs and contexts, which are vastly different. While English-native models are well-equipped to do so, the detection of abuse goes down by 15-20% in other less pre-educated languages. According to OpenAI research says department argues that constructing more proven multilingual abuse detection methods necessitates greater data gathering and specialized resources, which only serves to inflate the time taken, alongside cost).
However, false positives are still a massive problem for nsfw ai chat systems with it incorrectly flagging non-abusive content as unsavory which in turn may lead to user burnout and trust issues. Selecting the right balance of sensitivity and acuity is no small task as well; too strict filters act in a way that discourage open dialog, while mild security holes can let through potentially menacing material. Sam Altman, from OpenAI: «This is one of the biggest challenges we face in AI. Freedom and safety are both critical to a functioning society but follow-up work would be needed before implementing them in our infrastructure».
To sum up, nsfw ai chat systems can detect explicit abuse in many areas but they are not foolproof because of biased data-sets on which the models were trained and discrepancies those models have with their predecessors. For more details on this subject, check out nsfw ai chat.