The Coming India Crash

Posted on March 8, 2011 by Alaska

You heard it here first, folks. With India’s tech economy built on the export of low-cost IT and clerical-type jobs from the developed world to low-wage high-skilled India, this is not a good portent. Never mind the “computers-replace-lawyers” thrust of the article; the reality is more far-reaching. If computers can do this sort of by-rote-yet-complex litigation support, how long ’till they can do other rote-yet-complex tasks — like, say, programming?

The wife and I met at my first job out of law school in 1995, where we managed cases at a litigation-support tech business that used skilled humans to index, or “code,” millions of pages of documents in complex lawsuits for litigators. We handled all the documents for cases as varied as the infamous Tobacco litigation and the Charles Ng murder trial. The document-indexing tasks of our human coders were somewhat rote but required the “installed database” of a college-educated human brain in order to be completed with little error. Then the lawyers would take the indexed-document database and search it for relevant terms — the process the article says has been replaced by computers. Reading between the lines, it seems both the document-coding and the legal-searching tasks have been computerized.

Frankly, I’m not surprised that computer programs have been able to displace both the human coders and the human lawyers; I am surprised that it happened this quickly. In our day, OCR software had a 10-20% error rate, especially with legalese. That error rate must be close to zero nowadays for these programs to be able to work.

This entry was posted in Academia and Other Nonsense, Armageddon. Bookmark the permalink.

8 Responses to The Coming India Crash

Chris Byrne says:

March 8, 2011 at 11:09 am

this is big in banking as well; and in either case, the character error rate is actually quite high for standard english (about 1% on OCR friendly fonts, about 3% on non-ocr friendly fonts, three or four times that on anything that’s faxed or low quality photocopied) but with contextual filtering, is generally able to be compensated for properly.

In medical, legal, engineering, and complex banking documents though, there are a lot of terms which have a high degree of ambiguity in conformation (similar spelling, similar letter shapes, one letter off errors etc…) not derivable from context, and which have what we would call a high error cost. That is, confusion between two similar terms can cause a dramatic change in meaning.

This is why medical transcription is always human verified, against original voice recordings and physical notes. It’s just too easy for a machine to mistake one medication for another, or one dosage for another etc…
Chris Byrne says:

March 8, 2011 at 11:13 am

Oh and on the machine generated code thing (self programming); it’s possible now, but machines create code that is functional, but so poor as to be nearly impossible to fix when it breaks, which it does, frequently.

For one thing, machines are not good at coding to the quirks i.e. those many exceptional situations that arise in computing, where the standard rules don’t apply, or are suboptimal.

Also, machine generated code tends to be overly iterative, recursive, and redundant. A computer will choose to do the same simple thing, or a set of simple things, over and over again, rather than chosing to do something the more complicated but ultimately less “expensive” way.

This results in much slower, less efficient, and overall buggier code.

This will obviously improve as time goes on; and in a lot of cases, the penalty is so small, and the task of such low cost or low value that machine generated code makes sense to use. it’s cost and time effective.
Mad Rocket Scientist says:

March 8, 2011 at 11:29 am

Re: Auto-coding
What Chris said.
Kyle says:

March 8, 2011 at 11:57 am

You could also make the same statement about certain types of programmers that Chris has made about Auto-coding. 😉
Rivrdog says:

March 8, 2011 at 12:33 pm

If OCR is so good, why is it that the automated medical charting services have to have 2 levels of human backup AFTER the voice coding is transcribed? My sis was in this business and got laid off when her hospital system went to India with their transcribers, then ditched that and went with OCR for paper charts and a mix of technologies for transcription. It all failed, and the error rate now requires two levels of human backup (one of which is still in India).

Nope, when error-free works is REQUIRED, OCR is just not there yet as a technology, nor is speech recognition.
DirtCrashr says:

March 8, 2011 at 12:43 pm

It’s what happened to my graphics job, they code-it in instead of pushing pixels. In the future everybody will be unemployed…
Davidwhitewolf says:

March 9, 2011 at 8:04 am

@RD: doctors’ penmanship?
JTW says:

March 15, 2011 at 1:00 am

people have been claiming that computers will replace programmers “real soon now” for at least the 15 years I’ve been working in IT, and reading old SciFi novels they’ve been doing it for at least 15-20 years before that.

It ain’t happened yet, in fact the more we code the more we realise computers aren’t going to do it for us.

When I started out we were bombarded with advertorials and marketing talk about how “within 5 years” there’d be no more need for programmers, that managers could just input their business rules in human readable form and the software would magically turn it into a full blown business application without any flaws.
The tools of course never worked. At best they would produce some skeleton screens with extremely poor generated coding that a team of programmers would need to flesh out with the actual business rules, using a combination of translating those business rules into some scripting language and writing code in a native language like Java to integrate it all.
These products don’t work. They increase programmer workload from writing the thing from scratch, especially if there’s maintenance and changes to the system to be performed (which there always are in real systems rather than sales demos).

It’s now 15 years later and we still hear the same claims, and see the same kind of products.