CRYPTO-GRAM

Replies:

  • None.

Parents:

  • None.
CRYPTO-GRAM (http://www.counterpane.com/crypto-gram.html) is Bruce
"Applied Cryptography" Schneier's excellent email newsletter, and one of
the few things I bother to read every month. It's full of interesting
articles, including this one...


   >           Security Risks of Unicode
   >
   >
   >
   >  Unicode is an international character set.  Like ASCII, it provides a
   >  standard correspondence between the binary numbers that computers
   >  understand and the letters, digits, and punctuation that people
   >  understand.  But unlike ASCII, it seeks to provide a code for every
   >  character in every language in the world.  To do this requires more than
   >  256 characters, the 8-bit ASCII character set; default Unicode uses 16-bit
   >  characters, and there are rules to extend even that.
   >
   >  I don't know if anyone has considered the security implications of this.
   >
   >  Remember all those input validation attacks that were based on replacing
   >  characters with alternate representations, or that explored alternative
   >  delimiters?  For example, there was a hole in the IRIX Web server: if you
   >  could replace spaces with tabs you could fool the parser, and you could use
   >  hexadecimal escapes, strange quoting, and nulls to defeat input validation.
   >
   >  The Unicode specification includes all sorts of complicated new escape
   >  sequences.  They have things called UTF-8 and UTF-16, which allow several
   >  possible representations of various character codes, several different
   >  places where control-characters pop through, a scheme for placing
   >  diacriticals and accents in separated characters (looking very much like an
   >  escape), and hundreds of brand new punctuation characters and otherwise
   >  nonalphabetic characters.
   >
   >  The philosophy behind the Unicode spec is to provide all possible useful
   >  characters for applications that are 8- or 16-bit clean.  This is
   >  admirable, but it is nearly impossible to filter a Unicode character stream
   >  to decide what is "safe" in some application and what is not.
   >
   >  What happens when:
   >
   >  - We start attaching semantics to the new characters as delimiters, white
   >  space, etc?  With thousands of characters and new characters being added
   >  all the time, it will be extremely difficult to categorize all the possible
   >  characters consistently, and where there is inconsistency, there tends to
   >  be security holes.
   >
   >  - Somebody uses "modifier" characters in an unexpected way?
   >
   >  - Somebody uses UTF-8 or UTF-16 to encode a conventional character in a
   >  novel way to bypass validation checks?
   >
   >  With the ASCII character set, we could carefully study a small selection of
   >  characters, categorize them clearly, and make relatively straightforward
   >  decisions about the nature of each character.  And even here, there have
   >  been mistakes (forgetting about tabs, multicharacter control-sequence
   >  snafus, etc).  Still, a careful designer can figure out a safe way to deal
   >  with any possible character that can come off an untrusted wire by
   >  elimination if necessary.
   >
   >  With Unicode, we probably won't be able to get a consistent definition of
   >  what to accept, what is a delimiter under what circumstance, or how to
   >  handle arbitrary streams safely.  It's just a matter of time before simple
   >  validators pass data and upper layer software, trying to be helpful, attach
   >  magic-character semantics, and we have a brand-new variety of security holes.
   >
   >  Unicode is just too complex to ever be secure.
   >
   >  Unicode:
   >  http://www.unicode.org
   >
   >

HURL: fogo mailing list archives, maintained by Gerald Oskoboiny