The following appears in December, 1997 issue of The Active Voice
An Activist's Guide to Unique Identifiers
by Anna Forbes, MSS
In its recent position paper on "Monitoring of the HIV Epidemic", NAPWA expresses guarded support for HIV case reporting "only using unique or coded identifiers that insure privacy and confidentiality of the individual".
So what are these unique identifiers people are talking about? How do they work? Since just about any code can be cracked, why should I trust them to protect my privacy?
These are the very legitimate questions I'll try to address in this article.
Unique identifiers (UI's) are nothing new. We use them every day in the form of telephone numbers, zip codes, credit card numbers, product serial numbers, etc. Any number or letter-number code that has a one-to-one correspondence to a given person, thing or location is a UI. Your telephone number, for example, only rings in your house. When people call you, they don't have to be concerned that the phone will ring in someone else's house (assuming that they dial correctly) because your phone number corresponds with your phone on a one-to-one basis.
For HIV case reporting, we need a UI that has a couple of other important characteristics. Specifically, it needs to use common data elements, be reproducible and have a low duplication rate.
Aaaaaghhh.....techno-speak!!! What does all that mean?
Relax. Remember, you are the people who taught yourselves enough biochemistry to be able to understand how drugs work (and don't work) in the body. This is nowhere near as difficult as that.
Data elements are the chunks of information that UI's are made out of. Maryland has a UI system for HIV case reporting, for example, in which the data elements are the last four numbers of a person's Social Security number plus his/her birth date and codes indicating race and gender. The testing provider arranges these data elements in a specific way to produces a twelve-digit UI code. The code gets attached to the person's blood sample before it goes to the lab for HIV testing.
Now, we have public data elements and private data elements. Public data elements are pieces of information about you that show up in public records all the time, such as birthdate, race, gender, Social Security number, etc. Every time you turn around, someone is requiring you to list that information on a form and then entering it from the form into a data base. The information pieces that shows up at the Department of Motor Vehicles, birth and death registries, Social Security registries, etc. are public data elements.
Private data elements, on the other hand, are pieces of information that don't show up in public data bases. They're also called "keys". The private word, initials or set of numbers you use to access your account when you go to an automatic teller machine is a good example of a key. You control it. Even if somebody steals your bank card, he or she can't get into your account without your private key. More about these later.
When you select common data elements to use in a UI system, you're looking for pieces of information that everyone has, that don't change over time and that people don't mind giving at HIV test sites. This can be trickier than it sounds. Maryland chose to use last four digits of the Social Security number, for example, not realizing the extent to which people either didn't know their Social Security number or didn't want to give it out. In retrospect, the Maryland folks believe that their system (which works well, overall) might work even better if they hadn't selected this particular data elements.
Having a reproducible UI is important because, without it, people tested more than once will be assigned a different UI each time they are tested. This results in undetectable duplications -- people listed in the HIV registry more than once -- which throws off the accurate epidemiological picture we are trying to obtain of how many people are living with HIV and in what populations. This is also why random or sequential numbers don't work. They can be used to count the number of tests done but can't count the number of individual people tested with any accuracy.
Low duplication rate means that is very unlikely that two people will be assigned the same UI. Duplication can't be eliminated entirely (even name reporting has some duplication) but it can be reduced by using a good UI system. Soundex (frequently used as a UI system even though it's not technically a UI), has a duplication rate of approximately 10% - 20% --- too high by most people's standards. By contrast, the Unique Record Number System (designed by HRSA, the federal entity that administers the Ryan White Care Act) has a duplication rate of only .02% - .04% -- an acceptable rate in most people's books. The duplication rate you get depends on the data elements you select and the rules you establish (also called the algorithm) for creating the UI.
This sounds really complicated. Do UI's actually work? Is anyone using them?
YES! They are being used successfully in a number of settings to protect health-related information (not to mention being used constantly in the banking/business side of life). Two states, Maryland and Texas, are already using them for HIV case reporting. Texas hasn't been too happy with its experience, mainly because their system is desperately underfunded and because they've had a low level of "buy-in" by providers and local health departments.
Maryland's UI system is also underfunded but both the Health Department and the consumer/advocacy communities view it as a success. It enables the Health Department to collect and track information without causing testing avoidance or putting the privacy of people with HIV at risk. Maryland is already using the information it has collected through the UI system to make funding allocation decisions and plan targeted prevention and care programs.
UI-based (as opposed to name-based) HIV case reporting systems are also in place in Australia, Denmark, Belgium and the United Kingdom. And beyond HIV, states also using UI's to protect people's privacy in all kinds of sensitive, health-related situations.
New York state, for example, uses a UI in place of the woman's name on "fetal death certificates" (documentation of miscarriages and abortions). In Massachusetts, the state Health Department uses UI's on the records of people receiving state-funded mental health care. Pennsylvania, similarly, uses UI's in place of names to aggregate information about people receiving services through the state Office of Drug and Alcohol Programs. So there's nothing new or unusual about the idea of using UI's to track health care information.
But can't UI's be cracked? Why should I trust them?
Imagine a continuum (a big, long line). Put "absolute privacy" on the right end of the line and "no privacy at all" on the left end. All UI systems fall somewhere along that line, between the two extremes. You can put Social Security numbers up on the left end next to "no privacy at all" because, as we all know, practically everybody has your Social Security number on file. It's a UI but it's not one that guarantees any privacy.
Near the right end you can put UI's that combine a key (private data element) with the public data elements discussed above.
UI's made entirely of public data elements fall along the middle of the line. In the middle slightly toward the right, you can put computerized encryption systems that use only public data elements (the HRSA system mentioned above is one of those). Manually encrypted systems that use only public data elements (like the Maryland system) go in the middle slightly toward the left.
Computerized encryption systems give you somewhat more security than manual ones because more complex algorithms are harder to crack manually than simple ones. Since a computer can whiz through the process of encrypting (mixing up) the information, it can handle really fancy algorithms that have lots of steps in the same amount of time as it takes a human to carry out a simple algorithm. This means that a computer encrypted UI is less likely to be cracked by someone who is just trying to do it casually (a curious "browser" who somehow gets obtains a list of UI's). Even computer encrypted UI's, however, can be cracked by someone with a computer (even a laptop) and access to the UI algorithm.
People tend to assume that computer encrypted UI systems are too expensive, too difficult and would require every HIV testing provider to have a computer on site. But what if you used a centralized, call-activated computer that providers accessed via touch tone phone? The provider could call up the computer and punch in the necessary data elements using the dialing buttons. The computer at the other end could crunch up the data and read back the UI. How hard is that?
OK, I get that. But what are those UI's all the way on the right about?
They're the ones that are more secure because they include a key (private data element). Here's how they works.
Put on a black ski mask and imagine that you're trying to crack a UI system (Mission Impossible music rises in the background). No matter how good the encryption, any system that uses only public data elements can be cracked. You just need three things:
1) a computer
2) a secondary data base that has all the necessary data elements in it
3) a copy of the algorithm used to produce the UI's.
Your secondary data base could be Social Security records, a drivers license registry or anything that shows the data elements required for that particular UI system. You process the secondary data base through the algorithm to produce a UI's for each of the names on the secondary data base. Then you just cross-match those UI's against the original list of UI's you're trying to crack. Every time you find a match, you've identified a person on the original list of UI's. This is called "cracking by cross-matching".
Now (take a deep breath, we're almost out of the technical part), what if the UI system you're trying to crack is one that scrambles a key in along with the public data elements? Remember the key is a private word or number that won't show up in a public records data base. If it's a keyed system, you might as well take off the black ski mask because you won't be able to generate the second set of UI's. The public records data base can't provide all the data elements you need to produce the UI because it doesn't include the keys. Incorporating a private key selected or controlled by the consumer effectively jams any effort to crack the UI system by cross matching.
I know of two keyed system (although I'm sure there are many others). One is Client Key, a system in which the private word or phrase selected by the consumer is his/her key. The second is a system called DoubleLock, in which the private data element is a set of hand size measurements. Unless the consumer is present and willing to put his/her hand down on a piece of paper for measuring, the DoubleLock UI can't be produced.
Given all this, you'd think that people committed to privacy would automatically want to use keyed UI systems for HIV case reporting. BUT... because they can't be cracked by cross-matching, they also can't be used to cross-match HIV data against other, relevant data bases such as:
* the AIDS registry (to prevent having a lot of people listed twice, once on the HIV list and again on the AIDS registry),
* the national death registry (to make sure that people who have died are removed from the HIV registry so that it's current),
* the ADAP records (to find out what proportion of people with HIV in your state are getting ADAP assistance),etc.
In both Maryland and Texas, the process described above is exactly how they determine how many of the people in the HIV registry are also in the AIDS registry, the ADAP registry, etc. They produce UI's for everyone in those registries and then cross match those UI's against the HIV registry. But if states use a keyed UI system for HIV reporting, they won't be able to generate UI's for that second data base because they'll be missing one of the necessary data elements.
How important is that?
Only you and your state can decide.
What I recommend to states seriously considering this issue is that you form a Working Group that includes people with HIV/AIDS, Health Department personnel, testing providers and one or more UI experts to serve as technical advisors. Selecting a UI system is like buying a car -- you have to consider and discuss the advantages and disadvantages of a lot of models before choosing what you need and can afford.
The Working Group needs to ask itself these questions:
1) what data elements don't change over time and are supplied without objection?
2) what's the highest level of confidentiality we can agree on?
Is the state insisting that our UI system has to be one that can be cross-matched against other data bases? Can we come up with a way to get the ancillary information we need while using a keyed UI system for HIV case reporting? Can we agree to "grandfather it in" (i.e. start assigning keyed UI's to everyone with HIV/AIDS now with the understanding that, eventually, the AIDS registry and ADAP registries will be made up entirely of people with keyed UI's so we can cross-match at least those registries against the HIV list)?
3) what's the highest level of confidentiality we can afford? Adopting any kind of HIV case reporting system is going to cost money.
Maryland, for example, is spending about $100,000 per year to process 7500 UI reports of HIV infection. In comparison, New York is spending at least $200,000 per year on the five additional staff they had to add to process 14,400 name-based reports of low CD4 test results per year. The two systems are comparable in cost.
4) is the system we're considering user-friendly enough that providers will comply when required to use it?
This is a tough set of questions and the Working Group may have to sweat to hammer out an acceptable compromise. Like all good compromise, it will probably make both sides a little unhappy. You may not get as much privacy protection as you want and the Health Department will have to cope with a system that, let's face it, isn't going to be as easy for them to use as name reporting would be.
Remember that the UI expert(s) should be there as a technical advisor only. The Working Group should think carefully about the first three questions before addressing number four. Technology should serve people; people shouldn't have to conform to the technology. Once you figure out what you want, it's the UI expert's job to find or create a system that meets your requirements and that is user-friendly enough for providers. Don't let yourself get bulldozed into a system that doesn't work for you.
Above all, don't let anyone tell you that implementing a UI-based HIV case reporting system in your state is impossible. Would you look at a Volkswagon beetle, decide it's too small, look at a Mercedes, decide it's too expensive and, on that basis, decide that you don't really want a car? Heck, no! You'd just keep shopping!
Finding the right UI system and getting your state to use it may be a labor intensive advocacy process but think about it. Isn't your privacy worth it?
ACKNOWLEDGMENT: With sincere thanks to Walter Cuirle, who taught me practically everything I know on this subject.
Anna Forbes, MSS
Ardmore, PA 19003
The Active Voice, December 1997
see: Names Reporting: Activists' Responses
Attention to HIV Reporting As GMHC Changes Policy
___________by John S. James__AIDS Treatment News
Myths and Facts about HIV Names Reporting
Congressional Legislation (The Coburn Bill)
NAMES News Media Clips