Data management issues

The ‘Word Processing’ software is a legacy item of software, but it is still perceived as the foundation of legal software tools. It’s the engine for pumping out correspondence and legal documents. Software vendors still base their software around the ability to integrate, or leverage word-processing software.

As a result, we don’t have a research focus on the middle ground, where:

  • the content itself requires computational tools.
  • the data must be managed outside of the word processing environment.

When lawyers finally realise that information in Word Processors is just ‘rendered output’, and not an intelligent basis for either input or the application of data science, then we will see a shift in the type of tools that lawyers are interested in. The solution is not building tools around Word Processors, but removing word processors from the upstream work flow, and using them, if at all, for cleaning up and appearance for output reports.

It is not in the interests of the present software market to treat their own products as legacy products, or ones that need to be unpacked and treated as ‘rendering’ tools, not information processing tools.

Once Word Processing is seen as a legacy format, we will move on to tools that help communicate legal knowledge, and interact with it. In order to interact with legal information, we need algorithms that help extract information form legacy formats, and algorithms to work with legal topics in flexible data structures and interfaces. They are recipes for transforming human ‘commands’, in an interactive environment, into ‘displays’ of the relevant information.

Workflow issues

The Marshall McLuhan mantra that the ‘medium is the message’ applies just as much to word processing as any other medium. Software is a medium, and word processors are a subset of software with their own specific influence on behaviour and thinking.

Word processors push you towards laborious crafting of documents, or ‘templates’ because they are very crude when it comes to navigating, storing or referring to specific legal topics. We can’t move them around easily, share them, refer to them or benefit from computational speed when trying to do so. In fact, the idea that speed might be important (or directness), is really not part of the conversation.

Do not be fooled into thinking that cut and paste is a tool for computation. It’s a mechanical tool, designed to make human being responsible for very low-level of information transfer, and leaving the decision-making outside the computing environment. It binds your hands to the computer. That’s not how you would think in the high-performance computing world, and it’s a lesson worth applying to other areas.

Word processor tools that prioritise making electronic documents look like traditional paper ones are less interested in how we might apply computation to the contents (e.g. to legal document topics). Computation is there used to allow the simulation of a clever, style-conscious typewriting machine. It assumes that a typist or clerical worker is going to be working with this information, not someone that wants to use an interactive command set that relies on domain-specific (and therefore efficient) terminology.

A stagnant market

The market for Word Processors is such that word processor sellers want to sell the same tool to the world. It takes a lot more work, and narrows the market potentially, to make information-based tools for specific professions or industries. They have to be good enough to attract a premium (at least for established companies). For new players, new tools might be priced to be competitive with the more generic Word Processor tools.

The current environment has achieved a ‘stable equilibrium’ because there are naive users of Word Processors who haven’t had the critical pressure to evaluate what they do, and how they do it in order to raise their expectations of their software tools. Software vendors have achieved market dominance with tools split into separate buckets where text tools like word processors are inherently distinguished from spreadsheets and databases. These do not exhaust, by any means, the possible designs for data structures, interfaces or workflows. However, whilst they are prevalent, people will fit themselves to the software, rather than develop more intelligent tools to fit their work.

New tools in the middle ground

This trend toward software that stores data in appearance-focussed file formats of ‘word processors’ has influenced what people regard as the ‘norm’. It is expected that people resume the manual typing process that attempts to render output as notes on a page. This is fine if you want to write, but what if you want to manage the contents as part of repeated, information-rich data? There is no capacity to do so.

Computer scientists have paid more attention to ‘natural language processing’ or ‘search’ algorithms than to analysing the legal content to which you can legitimately apply computation and data science to achieve benefits for data storage, data analysis, and output.

The consequence is that a topic like ‘algorithms’ as applied to legal contracts or documents hasn’t received as much attention as it should. And, as yet, there isn’t an interest in high quality free, or open-source software projects, or small market prototypes, that adequately bridge this gap. It is a gap that shouldn’t be left to software vendors, but to people that are able to analyse their own expert domains and ask better questions. This will change, but first it will require promoters and people that view Word Processors as a legacy format.

Paper-based legal documents have always been static versions of an interactive medium. We don’t read them in a linear way, even if they look like that. They are structured for some interaction, but it is limited by the linear medium. The opportunities now are to escape that linear world and move into the interactive ‘conversations’ that will be opened up by electronic storage formats.

The same will occur in legal proceedings involving civil litigation or disputes. The ‘conversation’ or interactive format of paper was achieved by having a sequence of interactions – a statement of claim, and then a response in the form of a defence. This was the ‘interactive’ approach in a low-tech world. But within each of those documents there was specific interaction between particular topics or paragraphs. The true interaction between content happens at a level internal to each of these paper documents. Word processors don’t allow you to get at, access and leverage those work goals. We are going to need better tools.

Leave a Reply

Your email address will not be published. Required fields are marked *