Beyond the Hype: What We Know So Far About the Value of AI Tools for Lawyers

Jennifer Ballard, Good Journey Consulting

In November 2025, Stephen Embry of Above the Law issued a wake-up call to outside counsel: in-house lawyers are increasingly using AI tools and experiencing efficiency gains, and it is only a matter of time before they require the same from their outside lawyers.

However, identifying one or more AI tools that will be the best fit for law practice is a daunting task. There are hundreds of AI tools for lawyers crowding the market, and there is significant hype surrounding the AI industry in general. Fortunately, there are a growing number of independent efforts to evaluate the real-world utility of AI tools for lawyers. These studies offer some data about AI tools at fixed points in time that can be used to make better informed decisions about AI tool selection.

Independent evaluations of AI tools for lawyers

Below are summaries of seven such independent studies, which individually and collectively reveal helpful insights into where it may (or may not) currently be worthwhile to integrate AI tools with your practice.

Contract drafting study 

A September 2025 contract drafting study from Legalbenchmarks.ai, a collaboration between legal professionals, AI experts, and researchers, evaluated 13 AI tools (seven legal industry AI tools and six general-purpose AI tools) against a human baseline that consisted of in-house commercial lawyers with an average of 10 years of working experience. The legal industry AI tools included in the study were August, Brackets, GC AI, InstaSpace, SimpleDocs, Wordsmith, and an anonymous tool, while the general-purpose tools were ChatGPT, Claude, Copilot, Gemini, Le Chat, and Qwen. The study found that some AI tools outperformed the human baseline in producing reliable first drafts of contracts. The study did not find a meaningful difference in the output reliability or output usefulness between the general-purpose and legal industry AI tools. The top performing tools for output were Gemini, ChatGPT, GC AI, Brackets, August, and SimpleDocs. The study concluded that while the legal industry AI tools were not outperforming general-purpose AI tools on output, they were beginning to differentiate themselves with workflow and support functionalities for lawyers, such as integrating with Microsoft Word, and offering clause libraries and templates. The most meaningful differentiator the study found among the legal industry AI tools was whether the tool integrated with existing workflow and technology. For workflow integration or support, the top performers were Brackets, GC AI, and SimpleDocs. You can read this study here.

Information extraction study 

The second study from Legalbenchmarks.ai, released in April 2025, focused on information extraction tasks for in-house lawyers. This study evaluated six AI tools, including two legal industry AI tools: GC AI and Vecflow’s Oliver, as well as two general-purpose AI assistant tools: Google’s Notebook LM and Microsoft Copilot, and two general-purpose LLM chatbots: DeepSeek and ChatGPT. All of the AI tools were scored on both accuracy and usefulness. The study found that the two legal-industry AI tools, GC AI and Oliver, received the highest combined scores, concluding that while general-purpose AI tools could match legal industry AI tools in accuracy, the legal industry AI tools delivered more value in usability and workflow integration. You can read this study here.

Vals Legal AI Report

In February 2025, Vals AI, a platform that seeks to advance generative AI with independent and scalable evaluation infrastructure, released the Vals Legal AI Report (VLAIR), which evaluated four legal industry AI tools (CoCounsel, Harvey Assistant, Oliver, and Vincent AI) and compared the results to a lawyer control group. The tools were evaluated across up to seven tasks commonly performed by lawyers (each company could opt into as many of the task evaluations as desired). One or more AI tools beat the lawyer control group on four tasks (document extraction, document question-answering, document summarization, and transcript analysis), while the lawyer control group surpassed the AI tools on two tasks (redlining and EDGAR research) and matched the highest performing tool on one task (chronology generation). Harvey Assistant, which participated in six of the seven tasks, had the strongest performance, receiving the top score on five tasks and the second-place score on one task, and beating or matching the lawyer control group in five tasks. This study can be accessed here.

VLAIR—Legal Research

In October 2025, Vals AI released an extension of VLAIR focusing on legal research. VLAIR—Legal Research evaluated three legal industry AI tools (Alexi, Counsel Stack, and Midpage), as well as ChatGPT and a human baseline of lawyers from one law firm who were all experienced in conducting legal research. The study involved 200 legal research questions. The AI tools and the lawyer baseline were each given a weighted score, with 50% of the score given to accuracy, while 40% was given to authoritativeness, meaning whether the response was supported by citations to proper sources, and 10% of the score was given to appropriateness, meaning whether the response was easily understood and could be shared as-is with others. The study found that the legal industry AI tools received the highest weighted scores, ranging from 76% to 78%, followed by ChatGPT at 74%, with the lawyer baseline scoring the lowest at 69%. Counsel Stack had the highest score of the legal industry AI tools.

Notably, the study found that when the AI tools outperformed the lawyer baseline, they did so by a large margin. Of the 200 questions included in the study, AI tools outperformed the lawyer baseline on 150 of the questions, and the average point margin was 31%.  In contrast, when the lawyer baseline outperformed the AI tools, it was by an average point margin of 9%, and typically involved questions concerning complex multi-jurisdictional analysis, judgment-based synthesis, or when a deeper understanding of context was necessary. You can read this study in its entirety here.

Vals AI LegalBench contributions 

In 2023, researchers created a benchmark called LegalBench, which included 162 legal reasoning tasks evaluated across 20 large language models (LLMs). Benchmarks are datasets and tasks that have been standardized to measure the capabilities of an AI model across an industry. Vals AI contributed to the LegalBench benchmark with a December 2025 update, which evaluated 92 AI models on legal tasks, finding that the top performing AI models were: (1) Gemini 3 Pro (87.04% accuracy), (2) Gemini 3 Flash (86.86% accuracy), and (3) GPT 5 (86.02% accuracy). You can read more about Vals AI’s contribution to LegalBench here.

Law student study 

The University of Minnesota published a study in March 2025 called AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice. In this study, law students tested Vincent AI, a legal industry AI tool that was refined using retrieval augmented generation (RAG) and OpenAI’s o1-preview, an AI reasoning model, on six legal tasks, finding that one or both AI tools significantly enhanced the quality of the legal work compared to the legal work performed without AI in five out of six tasks: (1) drafting an email for a client, (2) drafting a legal memo for a partner, (3) analyzing a complaint and drafting a written analysis, (4) drafting a motion to consolidate, and (5) drafting a persuasive letter. Additionally, the study found that both AI tools significantly boosted productivity in the same five out of six legal tasks, with particular strength in tasks like analyzing complaints and drafting persuasive letters. Neither tool demonstrated improvement in quality or efficiency for the sixth task, drafting a non-disclosure agreement. The study noted that it was the only task where participants were provided a general template to use in their response, which may have reduced the potential for AI-driven quality improvement. You can read this study in its entirety here.

Legal research hallucination study 

Stanford RegLab published a preprint study in May 2024 called, Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools. This study tested OpenAI’s GPT-4 along with three legal industry AI tools refined with RAG: Westlaw’s AI-Assisted Research, Ask Practical Law AI (both Thomson Reuters products), and Lexis+ AI, concluding that all four tools hallucinate. The hallucination rates of the RAG-tuned AI tools tested in the study were reduced compared to GPT-4 (which it found hallucinated 43% of the time) yet remained substantial. The study found that Westlaw’s AI-Assisted Research hallucinated one-third of the time, while Ask Practical Law AI and Lexis+ AI produced hallucinations in more than one of every six responses. LexisNexis and Thomson Reuters both responded that their internal testing and customer feedback demonstrated higher rates of accuracy than the study results, with Thomson Reuters asserting an accuracy rate of approximately 90% for their AI-Assisted Research tool. While the results of this study are already dated given the recent swift progression of AI developments, the Stanford study identified that the most important takeaway of its results was that the legal industry needs thorough and transparent benchmarks and evaluations of AI tools. This study can be accessed here.

What insights do these studies collectively provide?  

When these studies are considered collectively, it becomes evident that lawyers should not summarily dismiss AI tools. Several independent studies have now concluded that using an AI tool to perform certain tasks may elevate a lawyer’s work through improved quality and/or efficiency. Tasks that were found by the studies to benefit from the use of an AI tool included contract drafting, document extraction, document question-answering, document summarization, transcript analysis, drafting emails and letters, drafting complaints, analyzing complaints, drafting motions, and some legal research tasks.

In contrast, tasks where AI tools did not add value within the parameters of the studies included redlining, EDGAR research, and chronology generation. While the Minnesota law student study did not find added value in using AI tools to draft a non-disclosure agreement when the students were provided a general template to use in their response, lawyers can compare this finding to the more recent Legalbenchmarks.ai contract drafting study finding that some AI tools outperformed the human baseline of commercial lawyers with 10 years of experience in producing reliable first drafts of contracts. Additionally, lawyers can consider testing one or more AI tools for contract drafting to draw their own conclusions.

Over time, the findings from these studies can also be used to evaluate how AI tools are evolving. For example, when the findings of Vals AI’s LegalBench contributions are compared to the Stanford hallucination study, it appears that the accuracy of OpenAI’s GPT AI models has improved significantly since May 2024 (December 2025: 86.02% accuracy, May 2024: 57% accuracy). This is notable in part because many legal industry AI tools use OpenAI’s models and their competitors’ models as their underlying infrastructure.

Some of the studies concluded that it is a toss-up whether you can presently get better output from a general-purpose AI tool or a legal industry AI tool. Further, some of the studies note that legal industry AI tools are distinguishing themselves from the general-purpose AI models by offering better workflow integration and support. Additionally, lawyers should know that some legal industry AI tools may offer more data privacy and security advantages than consumer-grade general-purpose AI tools.

What else should lawyers consider when evaluating AI tool options?

Lawyers should be prepared to distinguish between independent studies, such as the ones discussed above, and in-house evaluations by the companies making AI tools for lawyers. Some AI tool studies are conducted by AI companies themselves and publicized for marketing purposes. While an AI tool company’s evaluations of its own product may provide useful data, it’s important to be mindful of the source of any data utilized for decision-making purposes.

Additionally, while the studies highlighted above have yielded helpful insights, the evaluations conducted to date have only assessed the tip of the iceberg. There are many uses for AI in legal practice and hundreds of AI tools for lawyers that have not been independently evaluated. This means that lawyers who will evaluate AI tool solutions beyond the tools and tasks included in the studies covered in this article should be prepared to do their own testing to determine if an AI tool is a good match for their organization.

Finally, AI tool selection should not begin and end with considering the AI tool options available. Instead, lawyers should start the AI tool selection process by gaining an understanding of the many possible uses that AI tools currently offer and prioritizing the technology issues experienced by their organizations. AI tools for legal research command significant attention in the legal industry, yet many lawyers have not taken time to consider whether legal research is really the highest priority technology issue that their organization needs to address with an AI tool.

Once a lawyer has clarity about where improved technology solutions are most needed in their unique practice, the information in this article becomes most useful, and better-informed decisions can be made about which AI tools deserve further consideration. Further evaluation of an AI tool prior to final selection may include testing the AI tool to assess its real-world performance and should always include a risk assessment of the AI tool’s data privacy and security policies to confirm alignment with a lawyer’s professional responsibilities. ♦

Want to learn more about AI tools for lawyers? Through June 30, 2026, use the code BIZ60 for $60 off Jennifer Ballard’s “How to Pick the Best AI Tools for Your Law Practice” CLE. Learn more here.

Generative AI in Practice: Copyright and Data Protection Considerations

Emily Maass and Leigh Gill, Immix Law

In December 2024, Apple released a new version of its mobile operating system—iOS 18.2, which includes Apple Intelligence and offers a ChatGPT extension. While this may sound like a tech story rather than a legal one, just below the humdrum march of technology are important legal considerations. Though artificial intelligence (AI) has been around for generations in an academic context, it has only recently become mainstream with important considerations for business attorneys. Before you click “update,” it is important to consider both the copyright and data protection implications of incorporating this technology into your practice.

Copyright and AI

In the two years since the release of ChatGPT, technology lawyers have seen the ubiquity of AI tools (and the adoption of those tools) outpace the developing law in the field of copyright. There are two key issues:

  • What is the propriety of using someone’s copyrighted work to train an AI?
  • Who owns the output of the AI tool?

These issues are intertwined, and there are currently more questions than answers. Ongoing litigation is extensive on the first issue, and in most cases the defendants (developers of AI tools) are asserting a fair use defense. Mark Lemley, a tech law luminary, has opined that permitting copying of works for a non-expressive purpose such as training an AI is consistent with copyright law’s objectives. If successful, a fair use defense would help reduce the universe of possible answers to the second question, but it wouldn’t answer the question.

The generative AI on the market is powered by machine learning algorithms, which means that the output is dependent on patterns found within large databases of information. For example, chatbots and spelling suggestions on your phone produce each word in a sentence as predicted by the sequence of words preceding and matched against a database of similar content. AI databases are typically black boxes, and there’s no clarity as to which copyrighted works may be in the database. Extensive litigation is ongoing—authors and publishers assert their copyrighted works are infringed by inclusion in the database. Tech companies respond that databases are transformative, the output doesn’t match the input, and any use of copyrighted works is fair use and non-infringing.

For consumers (including lawyers) who choose to use AI tools to generate new content, there is a somewhat separate question of ownership in the resulting work. If the content owners are successful in proving infringement, they could also assert that output of the tools is a derivative work in which they have rights. If the fair use defense is successful, the technology companies may claim ownership in the output. (Read your terms of use—commonly used AI tools typically do not claim ownership from users. Microsoft’s tools claim only limited use of customer data and allow users to own the output of its Copilot product, even going so far as offering to defend copyright claims arising from use of Copilot.) Only time and extensive litigation will determine whether fair use applies.

Agencies responsible for the administration of intellectual property laws have been quicker than courts to provide guidance, but there remains significant uncertainty. The U.S. Copyright Office has stated that it will issue a copyright registration to a human author who provides a work that was generated with AI tools only if the human (and not the AI) selected, arranged, and otherwise created the expression. The Copyright Office has refused registration for works that were machine created, regardless of how many programming decisions were involved in directing that machine to produce the output.

Guidance from the Copyright Office distinguishes between “assistive uses” of AI systems and “prompt engineering” on page eighteen of their copyrightability report: “The Office concludes that…prompts alone do not provide sufficient human control to make users of an AI system the authors of the output….While highly detailed prompts could contain the user’s desired expressive elements, at present they do not control how the AI system processes them in generating the output.” Quite apart from the ethical issues of using this developing technology in practice, if a lawyer uses a machine to produce a work product, there are no rights of authorship in that work product.

Legal AI and data protection

Attorneys are not immune from the pressure to incorporate AI into the tools of our trade. A quick online search lists dozens of tools claiming to leverage AI to make your practice faster, better, smarter, and more profitable than opposing counsel. Attorneys are expected to maintain competence with technology in their legal practice, and a firm’s comfort with adopting new technology can be determinative of its capacity for growth and longevity in an increasingly challenging legal market.

While this drive to innovate is nothing new for the legal profession, neither is the persistent nagging concern of how innovation may clash with our age-old promise to preserve client confidential information. Not all AI is created equal, and advertising a technology tool as “AI for lawyers” does not guarantee that the developers offer a product that can stand up to an attorney’s obligations to their clients. When considering adoption of a given AI tool, the question of whether it is designed to support a lawyer’s confidentiality obligations should be top of mind.

AI tools are frequently black boxes with respect to data provenance and disposition. This places a heavy due diligence burden on the law practice to thoroughly understand how the AI tool was trained and how the AI tool will use and protect the practice’s data once it is entrusted to the AI tool. As a starting point, consider these questions when performing due diligence on a potential new AI tool for your practice:

  • What data is used as training data for the AI tool? Can the vendor confirm that it was lawfully obtained and can be used by you for any purpose without infringing on the rights of third parties?
  • Does the vendor grant itself a broad license to use your data or disclose it to third parties? Check the terms and conditions, which are typically not up for negotiation.
  • Will your data be segregated on the AI tool’s systems, or combined with other users’ data?
  • Will the data you put into the AI tool (e.g., details about your practice, cases, work product) be used as training data?
  • Is it possible that your data (or your client’s data) could appear in another user’s output?

Frustratingly, it’s not uncommon for these questions to be met with somewhat vague responses that beg even more questions. If you choose to incorporate AI into your practice, there are some steps you can take to help safeguard your data and your clients’ confidentiality:

  • Adjust your software settings to prevent the AI tool from running constantly in the background or otherwise automatically collecting data from your email, phone calls, or other device applications where you input, process, or store sensitive or confidential information.
  • Turn off the AI tool’s “wake word” or any other setting where the AI tool tries to guess that it should start recording, to prevent any unintended collection of data. Configure your privacy settings so that you are required to turn on the AI tool directly. (See Lopez et al v. Apple Inc., Case No. 5:19-cv-04577.)
  • Avoid inputting confidential information into an AI tool unless the vendor can provide you with legally binding assurance that the AI tool is expressly designed for the practice of law and safeguarding sensitive data.
  • Always notify your clients and obtain their consent before using an AI assistant in meetings or conversations, recording a phone call, or using other AI tools when working with their confidential information.
  • Regularly check your software and devices for recordings or other data storage from AI tools to confirm that the AI tool is only collecting data when directly prompted by you.
  • Configure your settings to automatically delete your data at regular intervals (e.g., every thirty days). Confirm that the deletion is permanent and that your data is not being stored elsewhere on the vendor’s systems.
  • Beware of relying too heavily on AI-generated outputs. Outputs containing factual statements, quotations, or citation might be the result of an AI hallucination. Also, remember that AI outputs are only as good as the training data used to develop the tool and the clarity of your prompt.

Lawyers must be aware of newly developing technology, and they have a duty of competence in the tools they use. AI is everywhere, and it is bound to become a key underlying technology in the practice of law. In these days of early adoption, attorneys must take care when selecting AI-driven technology solutions, with a focus on client confidentiality and quality work product. AI tools can be a valuable resource, but they are not a substitute for good, careful lawyering. ♦