Is AI Running the Government? Here’s What We Know

The Trump administration is letting the generative AI chatbots loose.

Federal agencies such as the General Services Administration and the Social Security Administration have rolled out ChatGPT-esque tech for their workers. The Department of Veterans Affairs is using generative AI to write code.

The U.S. Army has deployed CamoGPT, a generative AI tool, to review documents to eliminate references to diversity, equity, and inclusion. More tools are coming down the line. The Department of Education has proposed using generative AI to answer questions from students and families on financial aid and loan repayment.

Generative AI is meant to automate tasks that government workers previously performed, with a predicted 300,000 job cuts from the federal workforce by the end of the year.

But the technology isn’t ready to take on much of this work, says Meg Young, a researcher at Data & Society, an independent nonprofit research and policy institute in New York City.

“We’re in an insane hype cycle,” she says.

What does AI do for the American government?

Currently, government chatbots are largely meant for general tasks, such as helping federal workers write e-mails and summarize documents. But you can expect government agencies to give them more responsibilities soon. And in many cases, generative AI is not up to the task.

For example, the GSA wants to use generative AI for tasks related to procurement. Procurement is the legal and bureaucratic process by which the government purchases goods and services from private companies. For example, a government would go through procurement to find a contractor when constructing a new office building. 

The procurement process involves lawyers from the government and the company negotiating a contract that ensures that the company abides by government regulations, such as transparency requirements or American Disabilities Act requirements. The contract may also contain what repairs the company is liable for after delivering the product. 

It’s unclear that generative AI will speed up procurement, according to Young. It could, for example, make it easier for government employees to search and summarize documents, she says. But lawyers may find generative AI too error-prone to use in many of the steps in the procurement process, which involve negotiations over large amounts of money. Generative AI may even waste time.

Lawyers have to carefully vet the language in these contracts. In many cases, they have already agreed on the accepted wording.

“If you have a chatbot generating new terms, it’s creating a lot of work and burning a lot of legal time,” says Young. “The most time-saving thing is to just copy and paste.” 

Government workers also need to be vigilant when using generative AI on legal topics, as they’re not reliably accurate at legal reasoning. A 2024 study found that chatbots specifically designed for legal research, released by the companies LexisNexis and Thomson Reuters, made factual errors, or hallucinations, 17% to 33% of the time.

While companies have released new legal AI tools since then, the upgrades suffer from similar problems, says Surani.

What kinds of mistakes does AI make?

The types of errors are wide-ranging. Most notably, in 2023, lawyers on behalf of a client suing Avianca Airlines were sanctioned after they cited nonexistent cases generated by ChatGPT. In another example, a chatbot trained for legal reasoning said that the Nebraska Supreme Court overruled the United States Supreme Court, says Faiz Surani, a co-author of the 2024 study.

“That remains inscrutable to me,” he says. “Most high schoolers could tell you that’s not how the judicial system works in this country.”

Other types of errors can be more subtle. The study found that the chatbots have difficulty distinguishing between the court’s decision and a litigant’s argument. They also found examples where the LLM cites a law that has been overturned.

Surani also found that the chatbots sometimes fail to recognize inaccuracies in the prompt itself. For example, when prompted with a question about the rulings of a fictional judge named Luther A. Wilgarten, the chatbot responded with a real case.

Legal reasoning is so tricky for generative AI because courts overrule cases and legislatures repeal laws. This system makes it so that statements about the law “can be 100% true at a point in time and then immediately cease to be true entirely,” says Surani.

He explains this in the context of a technique known as retrieval-augmented generation, which legal chatbots commonly used a year ago. In this technique, the system first gathers a few relevant cases from a database in response to a prompt and generates its output based on those cases.

But this method still often produces errors, the 2024 study found. When asked if the U.S. Constitution guarantees a right to abortion, for example, a chatbot might select Roe v. Wade and Planned Parenthood v. Casey, for example, and say yes. But it would be wrong, as Roe has been overruled by Dobbs v. Jackson Women’s Health Organization.

In addition, the law itself can be ambiguous. For example, the tax code isn’t always clear what you can write off as a medical expense, so that courts can consider individual cases.

“Courts have disagreements all the time, and so the answer, even for what seems like a simple question, can be pretty unclear,” says Leigh Osofsky, a law professor at the University of North Carolina, Chapel Hill.

Are your taxes being handed to a chatbot?

While the Internal Revenue Service doesn’t currently offer a generative AI-powered chatbot for public use, a 2024 IRS report recommended further investment in AI capabilities for such a chatbot.

To be sure, generative AI could be useful in government. A pilot program in Pennsylvania in partnership with OpenAI, for example, showed that using ChatGPT saved people an average of 95 minutes per day on administrative tasks such as writing emails and summarizing documents.

Young notes that the researchers administering the program did so in a measured way, by letting 175 employees explore how ChatGPT could fit into their existing workflows.

But the Trump administration has not followed similar restraint.

“This process that they’re following shows that they do not care if the AI works for its stated purpose,” says Young. “It’s too fast. It’s not being designed into specific people’s workflows. It’s not being carefully deployed for narrow purposes.”

The administration released GSAi on an accelerated timeline to 13,000 people.

In 2022, Osofsky conducted a study of automated government legal guidance, including chatbots. The chatbots she studied did not use generative AI. Their study makes several recommendations to the government about chatbots meant for public use, like the one proposed by Department of Education.

They recommend the chatbots come with disclaimers that inform users that they’re not talking to a human. The chatbot should also make clear that its output isn’t legally binding.

Right now, if a chatbot tells you you’re allowed to deduct a certain business expense, but the IRS disagrees, you can’t force the IRS to follow the chatbot’s response, and the chatbot should say so in its output.

Government agencies also need to adopt “a clear chain of command” showing who is in charge of creating and maintaining these chatbots, says Joshua Blank, a law professor at the University of California, Irvine, who collaborated with Osofsky on the study.

During their study, they often found the people developing the chatbots were technology experts who were somewhat siloed from other employees in the department. When the agency’s approach to legal guidance changed, it wasn’t always clear how the developers should update their respective chatbots.  

As the government ramps up use of generative AI, it’s important to remember that the technology is still in its infancy. You may trust it to come up with recipes and write your condolence cards, but governance is an entirely different beast.

Tech companies don’t know yet which AI use cases will be beneficial, says Young. OpenAI, Anthropic, and Google are actively looking for these use cases by partnering with governments.

“We’re still at the earliest days of assessing what AI is and isn’t useful for in governments,” says Young.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top