Keeping your personal data safe
Anyone who has ever filled out a government form has likely wondered: Why do I need to provide this information again? Doesn’t the government already have it? Can’t the different agencies just share data with each other?
On the surface, this makes sense. The U.S. Post Office knows your address. The Social Security Administration knows where you were born. If you’ve ever filed taxes, the IRS has everything from your address and employer to the number of children in your home. It could be a lot easier if, the next time you had to submit paperwork to the government — say, to apply for benefits, or renew a passport — it could just auto-fill everything the government already knows about you. So, why doesn’t it?
There is a lot of conversation about this topic in the news right now. You’ll hear the term “data pool” or “data lake” referring to a shared space where different government agencies can dump all the data they have on citizens into a common source, and then pull from it as needed.
But this isn’t a new idea. It’s one that has been considered, reconsidered, and left alone many times over the years. As part of Technology Transformation Services’ effort to modernize government systems, 18F often played a part in these conversations. And there are good reasons to think twice about an approach like that.
Laws about how the government handles data
There are two main laws that deal with how the government handles your personal information. First, there’s the Privacy Act of 1974, which forbids any agency from sharing your data without your express permission. It requires agencies to protect your information, be transparent about how it is being used, and communicate any changes through a public notice period.
Then there’s the E-Government Act of 2002, which expanded privacy protections as more recordkeeping became digitized. It says that before each agency can use a system to house personal data, it has to document a thorough security assessment of that system. It also requires them to make that report public, and to re-assess the system regularly. Basically, a “show us the receipts” approach to prove that every government system is keeping your data safe, using it only only for the uses you were told.
These laws would seem to prevent anyone from just dumping everyone’s data into the same place. To be clear, they don’t make that impossible. They just put checks in place to make sure that you’ve allowed your data to be shared, you’re aware of who will have access to it, and you can access documentation that verifies how it is being protected. It would take a great deal of time and effort to do that carefully and legally. It’s definitely not a “launch this in a few months” type of undertaking.
Best practices for data security
Data security isn’t a new challenge. It’s been around long enough that there is some well-established guidance to follow. Here are some best practices that 18F followed in all our projects, and how a common data pool might contradict them.
Access should be “need to know”
Everyone with access to a system should have only the level of access they need to do their job. People who don’t actually do the work should not get access at all. This means one’s rank or title in the company doesn’t determine their access to a system. Your data should only be seen by people who are specifically authorized to view it to do their job.
A system should only collect what it absolutely needs
When collecting private data from users, a system should also only collect what it needs from them, nothing more. This way, they have what they need but don’t put any other information at risk by keeping it unnecessarily. This is why different government services don’t all collect the same information. There’s no need for the IRS to know what school your child goes to, for example, even if the Department of Education might have it. And the U.S. Post Office doesn’t have any reason to know your salary, or whether you pay alimony to an ex, even though the IRS does.
You can see how a common data pool might be a challenge here. If Agency A collects data items 1-4 from you, and Agency B collects items 4-7, there may be some overlap to what they need, but not much. If the agencies combine their data, people at both agencies now have access to items 1-7. Do both agencies take the same security precautions in their systems? Will agency B also use items 1-3, even though you didn’t give them permission to have that information in the first place?
More ways to protect personal data
Other best practices that might be a challenge are:
- Audit trail. All actions in a system should be logged so there is a record of who is accessing what data when, and what they are doing with it.
- Data removal. Once an agency doesn’t need a piece of data anymore, it should be safely deleted or archived. This might be especially difficult in a shared space where agencies have entirely different needs.
- Data encryption. Data should be encrypted anytime it is stored or transmitted. In this situation, who gets those decryption keys? Which people at which agency?
These are all complex issues that would, again, take a massive effort to resolve. It would be more convenient to put these practices aside, but that’s the point. Security often isn’t convenient. But it is necessary.
Risks of not securing personal data
So what are the risks if the right laws and guidance aren’t followed in creating this kind of mega database? For one thing, it creates a very tempting target for bad actors. As government employees, we had detailed training every year on the importance of protecting personal data entrusted to the U.S. government. It included reminders of the many ways someone could try to access protected information, from directly hacking a system to bribing someone who already has access. Right now, they would have to do that to many different systems to get all that data. Combine it, and they only need to be successful once.
A database like this would also have the same potential risks as any other: things can break. Recent years have seen major impacts to air travel and office communication when a system goes down even briefly. If every government system relies on the same data source, and that source goes down, what happens for everyday Americans trying to apply for benefits, get passports, pay their taxes, or anything else that touches government systems?
18F balanced security with efficiency
All this doesn’t mean that you can’t make government more effective, or cut costs, without keeping Americans’ personal data secure. 18F’s work often did both. We worked with our agency partners to improve processes, simplify systems and remove redundancies, all while keeping info security front of mind.
We worked with 10X to create a dashboard for the GSA Privacy Office that made it easier and more efficient for the agency to keep track of Americans’ personal data, saving the agency hundreds of hours of work each year. We had hoped to scale that solution to other agencies as well, but didn’t get the chance.
Many of our projects had to go through something called the authorization to operate (ATO) process, where an authorized group assesses your system and verifies that it meets all the security requirements needed in order to be trusted with government data. This was a necessary step, but it took a long time (6-18 months) and slowed projects down. 18F put a team on this challenge, took the time to dig into each step of the process, and redesigned it down to just 30 days! Then we shared how we did it, hoping other agencies would be able to use our lessons learned to speed up their processes, too.
18F demonstrated that government can balance security with efficiency — and still follow laws and policies. And even though we’ve been eliminated, we want the public to know that their data should be held using the best security standards, and according to the laws that government agencies and employees are required to follow. These are important checks put in place to make sure your personal information is safe.