security-services message

Subject: Anonymity: The real note
From: Marlena Erdos <marlena@us.ibm.com>
To: security-services@lists.oasis-open.org
Date: Mon, 08 Oct 2001 20:45:29 -0400
Dear SAML'ers

This is the "real" write up on anonymity.
Please however read the "pre-note" first.
Thanks.

---------------------------------------------------------

Here's the outline of this note:

   Definitions that Relate to Anonymity
   Pseudonymity & Anonymity
   Behavior & Anonymity
   Upshot for SAML (aka Executive Summary)
   References


Definitions that Relate to Anonymity/

   I found no definition of anonymity which I consider to be
satisfying for all cases.  Many definitions deal with the simple case
case of a sender and a message, and discuss "anonymity"
in terms of not being able to link a given sender to
a sent message, or a message back to a sender. [1]
  And while that "works" for the "one off" case,
it ignores the aggregation of information that is possible
over time based on *behavior* rather than an identifier.
(More about this later in the note).

  That said, I found two "notions" which I believe are generally
useful,and that relate to each other.

  The first notion is to think about anonymity as being "within a set".
Here's a quote (again from [1]):

"To enable anonymity of a subject, there always has to be
an appropriate set of subjects with potentially the same attributes....
... Anonymity is the stronger, the larger the respective
anonymity set is and the more evenly distributed the sending or
receiving,respectively, of the subjects within that set is".

And I'll add, "the more uniformly distributed the behavior of
the users within the set"!

  This is relevant to SAML because of our use of "authorities".
Even if a Subject is "anonymous", that subject is
still identifiable as a member of the set of Subjects within the domain
of the relevant authority.
  In the case where aggregating attributes of the user are provided,
the "set" can become much smaller.  For example,  let's say the
user is "anonymous" but has the attribute of
"student in Course 6@mit.edu". Certainly, the number of Course 6
students is less than the number of MIT-affiliated persons which is
less  than the number of users "everywhere".

  Why does this matter?   Because of the second notion (from
[2]). This idea is that non-anonymity leads to the ability
of an adversary to harm.  Here's a quote:

  "Both anonymity and pseudonymity protect the privacy of the user's
location and true name. Location refers to the actual physical connection
to the system. The term ``true name'' was introduced by Vinge and
popularized by May to refer to the legal identity of an individual.
Knowing someone's true name or location allows you to hurt him or her."


  And in a nice unification of the notion of anonymity within a set and
ability to harm, we have (also from [2]), the following:
  "We might say that a system is partially anonymous if an adversary can
only narrow down a search for a user to one of a ``set of suspects.'' If
the set is large enough, then it is impractical for an adversary to act
as  if any single suspect were guilty. On the other hand, when the set
of suspects is small, mere suspicion may cause an adversary to take
action against all of them."

  In SAML,the best we can ever do is "partial anonymity" because of
our use of authorities.  And we can do a lot worse depending on
how identifiers are employed.  And users can screw themselves despite
our best efforts.
  This next section discusses pseudonymity and how it relates to
anonymity. The following section discusses behavior and how unusual
behavior can serve to defeat anonymity.

Pseudonymity & Anonymity/

  Apart from "legal identity",any identifier for a Subject
can be considered a pseudonym.  And even notions like
"holder of key" can be considered as serving as the moral
equivalent of a pseudonym (MEOP) in linking an action (or set
of actions) to a Subject. Finally, even a description such
as "the user that just requested access to object XYZ at time 23:34"
can serve as MEOP.
   My point is, that with respect to "ability to harm"
it makes no difference whether the user is described with
an identifier or described by behavior (i.e. use of a key,
or performance of an action).
   What does make a difference is how often the MEOP
is used.
   [2] gives a taxonomy of pseudonyms starting from
personal pseudonyms (like nicknames) that are used all
the time, through various types of role pseudonyms (e.g.
Secretary of Defense), on to "one time use" pseudonyms.
   Only one time use pseudonyms can give you anonymity (within
SAML, consider this as "anonymity within a set").
The more often you use a given pseudonym, the more you reduce
your anonymity and the more likely it is that you can be harmed.
(When I say pseudonym here I mean MEOP as well).

    This leads onto the discussion of behavior.

Behavior and Anonymity/

   As Joe Klein can attest, anonymity isn't all it is cracked up to
be.
   Klein is the "Anonymous" who authored Primary Colors.  Despite
his denials he was unmasked as the author by Don Foster, a Vassar
professor who did a forensic analysis of the text of Primary Colors.
Foster compared that text with texts from a list of suspects that he
devised based on their knowledge bases and writing proclivities.
   It was Klein's idiosyncratic usages that did him in (though apparently
all authors have them).
   The relevant point for SAML is that an "anonymous" user (even one that
is never named) can be identified enough to be harmed by repeated
unusual behavior.  Here are some examples:
   A user who each Tuesday at 21:00 access a database that correlates
finger lengths and life span starts to be non-anonymous.  Depending on
that user's other behavior, she or he may become "traceable" [3] in
that other "identifying" information may be able to be collected.
   A user who routinely buys an usual set of products from a
networked vending machine, certainly opens themselves to harm (by
virtue of booby-trapping the products).

Upshots For SAML aka Execute Summary/

   Origin site authorities (i.e. Authentication Authorities and
Attribute Authorities) can provide a degree of "partial anonymity"
by employing one-time-use identifiers or keys (for the "holder of
key" case).
   This anonymity is "partial" at best because the Subject is
necessarily confined to the set of Subjects in a relationship
with the Authority.
   This set may be further reduced (thus further reducing anonymity)
when aggregating attributes are used that further subset the user
community at the origin site.

   Users who truly care about anonymity must take care to
disguise or avoid unusual patterns of behavior that could
serve to "de-anonymize" them over time.



REFERENCES
1.  Anonymity, Unobservability, and Pseudonymity --
A Proposal for Terminology
Andreas Pfitzmann
Marit Köhntopp
www.cert.org/IHW2001/terminology_proposal.pdf


2.The Free Haven Project:
Distributed Anonymous Storage Service
Roger Dingledine & Michael J. Freedman & David Molnar
http://www.freehaven.net/paper/node6.html
http://www.freehaven.net/paper/node7.html

3. Pooling Intellectual Capital:
Thoughts on Anonymity, Pseudonymity, and Limited Liability in Cyberspace
David G. Post
http://www.cli.org/DPost/paper8.htm

-------------------------

That is it.

Questions and comments are welcomed.

Regards,
Marlena