Content Moderation by Social Media Applications


Social media has mushroomed into the lens through which we view the world, develop perspective, and share opinions, ideas, feelings and information with either a niche audience or the general public. Users can freely upload content on their social media handles. Taking advantage of this, people are misusing social platforms by posting objectionable content and sharing it with the vast audience. Objectionable content could be pictures, texts, videos depicting sexual acts, violence, hate speech, terrorism or fake news. We can surely imagine the impact of such obscene content on the younger generation for whom social media is just a medium for entertainment. Hence, controlling the presence of such offensive content is mandatory to limit the exposure of normal users to it.

Lately, an Indian microblogging application ‘Koo’ stated to be on its path to becoming a platform that integrates artificial intelligence (AI) and machine learning to optimise its content moderation. Content moderation is a way that can assist in lowering the prevalence of offensive content on social media platforms. Content moderation is screening of inappropriate stuff that users post. The procedure comprises the use of pre-established guidelines for content monitoring. The uploaded content is closely monitored by experts or AI enabled mechanisms or both. If it doesn’t satisfy the guidelines, the content gets flagged and removed. Every social media app has certain guidelines on the basis of which it segregates the unacceptable content. For instance, ‘Snapchat’ has community guidelines regarding prohibition of sexual content, harassment, threats, violence, deceptive information and if any user violates these community guidelines, snapchat holds the right to remove the offending content, terminate or limit the visibility of the user account. As per the latest transparency report published by Snapchat, in the first half of 2022, it enforced against 56,88,970 pieces of content globally that violated their policies.


Let's analyse how social media applications govern all the content. Due to the sheer amount of content posted every minute, manual moderation by humans is not scalable. Hence, Artificial Intelligence is being used for identifying and filtering the content along with manual technique. AI mechanisms are built to learn whether a piece of content contains anything offensive, these mechanisms then determine the action on the content, such as eliminating it from the platform or reducing its distribution. Sometimes if the piece of content requires further review, AI sends it to the human review team and the final decision is taken by the review team. AI uses various moderation methods like pre-moderation, when a user posts content it directly undergoes review prior to actually being visible to the public. While in post moderation, the content is reviewed after being published. Reactive moderation is a method under which users are allowed to report or flag the content that they find inappropriate and then the content is reviewed by AI mechanism or human moderators. Finally comes the distributed moderation, users determine whether the content is inappropriate. If any content is being reported by users several times it will automatically be hidden. Apart from this, to help people avoid posts that might seem sensitive to them are usually visible with a warning to let people know about the content before they view that.

In spite of having a properly regulated system to control the user generated content, we regularly collide with many pictures and videos that are offensive or vulgar, thus are outside the context of community guidelines of applications. The question is how this material is freely uploaded and distributed despite the availability of such multi-layered protective mechanisms. AI technology used for moderation has algorithms based on datasets which specify certain words and phrases to be objectionable and AI mechanisms restrict those words in audio and text forms but users bypass this process using algospeak. Algospeak means replacing the words disfavoured by AI with inoffensive or innocuous words. For instance, ‘pron’ replaces the word ‘porn’ and ‘unalive’ replaces the word ‘kill’. Content creators are using others means as well to escape, including use of alphanumeric characters ( @, $, *, # ) when spelling out curse words, this reduces the probability of being identified by AI moderators. Another evil way is cloaking, where links to malevolent content are concealed as safe. These links are cautiously cloaked and when the moderator goes through those links they are redirected to an informative website that adheres to the protective norms. Blurring the explicit videos or pictures can also evade corrupted content from AI moderation attempts. Altering the dataset to confuse AI moderation bots into interpreting the content as compliant with the guidelines is another way of breaching the system.


No doubt that AI enabled mechanisms are speedy but human interventions are mandatory to curb such affairs. A full proof defence mechanism cannot be built based on single moderation technique. An efficient content moderation system is an integrated approach of both humans and technology. Technology-powered moderation makes content moderation swift and ongoing, especially for platforms that frequently handle big volumes of end-user content. The process of content verification carried out by AI is improved in accuracy and efficiency with the help of human moderators.