Analyzing Homographic Phishing Characters
е negates e!
Recently, I’ve seen a couple of phishing scams hit a few of my friends on Facebook (yes, I still use that thing), and after I asked how they fell for something like this, they replied with “Well, it looked nearly identical to the real Facebook url.”
This made me investigate further into this for a bit. I thought that the general lecture I gave to my friends that “it’s one of those common sense things to check links so you’re not actually going to like fàcèbook or whatever” kept them safe – but when it’s characters that literally look like the exact same, but are two different symbols (such as
а, becomes a potential concern for phishing attacks. Allow me to elaborate.
Let’s compare the difference between this
е and this
Googling the former:
Googling the latter:
So why does this exist?
I’m not going to get really technical here when it comes to historic alphabet development, but when the Latin alphabet, Cyrillic alphabet[s] and the Greek alphabet were being developed, they ended up deriving similar letters from each other. As an example, both the Latin character “a” and Cyrllic character “а” were derived from the Greek character “α” (alpha), but are localized to two different regions of the world.
Similar things could be said in the Latin alphabet, where the lowercase L “l” and the uppercase i “I” look extremely identical to each other. In addition, combining “rn” together can deceive people to think it’s the character “m” instead. At first glance, the domain “rnicrosoft.com” could look like a legitimate website, or even “steamcornmunlty.com”, but “microsoft” negates “rnicrosoft” and “steamcommunity” negates “steamcornmunlty”.
The similarities between these characters are formally described as “homographs”: when two words look alike, but have two completely different meanings. Domain Name Systems (DNS) allows these different alphabetical systems to be used as a fully qualified domain name (FQDN) by labeling them as internationalized domain name(s) (IDN). From Wikipedia: “An internationalized domain name (IDN) is an Internet domain name that contains at least one [character] displayed in software applications, in a [foreign-based.] language-specific script or alphabet, such as characters with diacritics or ligatures.”
Phishing attacks can take advantage of this by using homographs to replace lookalike characters, such as writing
facеbook instead of writing
gооglе instead of
Additionally, DNS labels are not case-sensitive, meaning “ExAmPlE.com” and “example.com” are equivalent to each other. Because of this, phishers can take advantage of case-insensitivity and deceive people into falling for imposterous links.
 http://www.tcpipguide.com/free/t_DNSLabelsNamesandSyntaxRules.htm  https://www.ietf.org/rfc/rfc3492.txt http://www.irongeek.com/i.php?page=security/out-of-character-use-of-punycode-and-homoglyph-attacks-to-obfuscate-urls-for-phishing  https://en.wikipedia.org/wiki/Internationalized_domain_name  https://en.wikipedia.org/wiki/Punycode  https://en.wikipedia.org/wiki/Cyrillic_alphabets  https://en.wikipedia.org/wiki/A_(Cyrillic)  https://en.wikipedia.org/wiki/A  https://en.wikipedia.org/wiki/IDN_homograph_attack  https://www.compart.com/en/unicode/U+0435  https://tools.ietf.org/html/rfc5892