May 18, 2023
ROCKVILLE, MD–In a surprise announcement, Lexerica, a 23-employee, privately funded startup based in Rockville, Maryland, revealed Friday that it has completed an initial draft sequence of the English language. “This is a real David and Goliath story, a triumph for the little guy with the big idea,” exclaims Lucian Blunderbuss, founder, Chairman, and CEO of Lexerica. “We tackled the sequencing problem using only a fraction of the resources and employee-hours the experts said we would need.”
The sequence, which consists of “all well-formed and near-well-formed expressions possible in the English language,” was assembled with the help of a network of sophisticated computers employing a proprietary ‘scattershot’ sequencing algorithm developed by Blunderbuss himself.
“The trick to building the sequence is realizing that it’s more about figuring out what to include than figuring out what to exclude,” explains Blunderbuss. “Rather than building a database of all mathematically possible sequences of letters and then sifting through that for well-formed expressions, we took a large set of well-formed expressions, smashed them together, and then reformed them, looking for expressions in what resulted.”
Blunderbuss, once Chief Technical Director of the Public Domain Sequencing Project, a publicly funded effort to produce a similar sequence, broke with the Project over their reluctance to make use of the ‘scattershot’ technique. “Lucian is undoubtedly brilliant,” opines Georgiana Jumper, Chair of the PDSP executive committee. “And we applaud him for sticking to his guns, but the Project continues to have some reservations about the technique and the completeness of the sequence it has produced.”
Organizers of the Public Project also question the propriety of a privately funded and privately owned sequence. “The Lexerica sequence raises serious questions,” notes Pilar Daise, an anaphora expert and Project researcher. “While the Project is dedicated to increasing the size of the public domain by dedicating the sequence to it, the Lexerica sequence, once published, could conceivably be the source of copyright claims against the bulk of future linguistic expression.”
In theory the Lexerica sequence includes, according to Daise, “millions of billions of novel-length expressions,” each of which could conceivably pre-empt the work of a future Hemingway or Tolstoy. “It’s crucial that the sequence be part of the public domain,” exclaims Daise. “Otherwise the owner of the sequence will have a strangle-hold on future creative work.”
Though declining explicitly to dedicate the Lexerica sequence to the public domain, CEO Blunderbuss assures critics that his company is not interested in owning new expressions. “We have yet to fully resolve our business model, but you can rest assured that we will not start by suing individual writers. That’s just not in the cards. Our initial plan is to make the sequence commercially available, on a pay-to-use basis. It’ll actually be a service for writers, who will be able to prospect for new works in the sequence, rather than starting from scratch every time.”
“We don’t really know their plans,” notes PDSP Chair Jumper. “It is interesting to note, however, that [Blunderbuss’] scattershot technique relied upon well-formed expressions from the public domain. They had to have something to start with.”
Acknowledging that the Lexerica sequence did in fact make use of a number of public domain works as “seed expressions”–including some published early portions of the Public Project’s sequence–Blunderbuss is quick to point out that Lexerica also made use of many “indigenous well-formed expressions,” including, notably, three of Blunderbuss’ published books, transcripts of 133 of his public speeches, and transcriptions of more than 1,200 hours of his personal phone conversations.