Where AI fails: the judgment layer
The tasks where AI fails are not harder versions of the tasks where it succeeds. They are structurally different. They require reasoning about situations the model has not encountered, understanding how provisions interact across an agreement, and making decisions that depend on information the model cannot access: the client’s risk appetite, the counterparty’s negotiation history, the regulatory trajectory in a specific jurisdiction.
Novel reasoning is the clearest failure point. Many real estate matters run on analogy: determining whether a new situation is sufficiently similar to a precedent that the same rule should apply. This is not a similarity-matching exercise. It is an evaluative judgment about which similarities are legally or practically relevant. This evaluation shifts as real estate contracts change and evolve. Current models can retrieve examples with overlapping fact patterns or ones that are otherwise dissimilar. They cannot determine whether the overlap matters in a way that would inform responsible client advice.
Contextual risk assessment exposes the gap between extraction and understanding. An AI tool reviewing a limitation of liability clause can tell you the cap is set at the value of the contract over the preceding 12 months. It cannot tell you that this cap is appropriate for a low-value software subscription and wildly inadequate for a retainer on a mission-critical infrastructure service where a failure could produce orders-of-magnitude greater downstream loss. The same clause, the same language, the same extraction result, entirely different risk profiles. The manager knows this because a lawyer or management professional understands the business context. The model does not. No amount of prompt engineering fixes a fundamental absence of situational awareness (I love that term) LoL
Bespoke agreement drafting reveals the limits even in AI’s strongest domain. The same benchmark evaluations that show AI outperforming humans on mechanical drafting consistently show humans excelling at interpreting client intent, avoiding unnecessary concessions, and integrating multiple information sources into a coherent risk allocation.
Many forms of business nd legal analysis cannot be automated. This is especially true when a determination necessitates inclusion of another document that is not in the prompts to AI. Any workflow that relies on AI for screening/ review without robust human review at every decision point is building in risk that will eventually surface at the worst possible time.
Actions to take next:
- Identify three to five recent matters where the outcome depended on judgment that AI could not have provided. Use these as concrete examples when explaining limitations to stakeholders who equate contract review demos with full legal automation.
- Establish a written policy specifying which task categories require human decision-making regardless of AI involvement. Privilege determinations, conflict checks, and risk assessments above a defined threshold should be on this list.
- Review any AI-assisted workflow that currently lacks a defined handoff point between model output and human review. If the handoff is informal, formalize it before an error forces the issue.

Comments(2)