Google has developed an AI-powered spam detection system that can help spot «adversarial text manipulations» like emails with special characters, emojis, typos and other characters that easily bypassed Gmail defenses.
Touted as «one of the largest defense upgrades in recent years,» the Google upgrade comes in the form of a new text classification system called RETVec (Resilient and Efficient Text Vectorizer).
«To help make text classifiers more robust and efficient, we've developed a novel, multilingual text vectorizer called RETVec that helps models achieve state-of-the-art classification performance and drastically reduces computational cost,» the company said.
Systems such as Gmail, YouTube and Google Play rely on text classification models to identify harmful content including phishing attacks, inappropriate comments, and scams.
These types of texts are harder for machine learning models to classify because bad actors rely on adversarial text manipulations to actively attempt to evade the classifiers.
«For example, they will use homoglyphs, invisible characters, and keyword stuffing to bypass defenses,» said the tech giant.
Due to its novel architecture, RETVec works out-of-the-box on every language and all characters without the need for text preprocessing, making it the ideal candidate for on-device, web, and large-scale text classification deployments.
«Models trained with RETVec exhibit