Databases, creativity, and copyright -- what's missing is wrong?

From: Karsten M. Self <kmself[_at_]ix.netcom.com>
Date: Sun, 27 Sep 1998 09:53:14 +0000

I've just been reading an analysis of Feist Publications v. Rural Telephone Service, which established that a phone directory is not subject to copyright protection.

Opinion at:
http://caselaw.findlaw.com/scripts/getcase.pl?navby=search&linkurl=<%LINKURL%>&graphurl=<%GRAPHURL%>&court=US&case=/data/us/499/340.html

The analysis suggests the case was ruled for Feist based on the principle that:

> (a) Article I, 8, cl. 8, of the Constitution mandates originality as a
> prerequisite for copyright protection. The constitutional requirement
> necessitates independent creation plus a modicum of creativity. Since
> facts do not owe their origin to an act of authorship, they are not
> original, and thus are not copyrightable. Although a compilation of
> facts may possess the requisite originality because the author typically
> chooses which facts to include, in what order to place them, and how to
> arrange the data so that readers may use them effectively, copyright
> protection extends only to those components of the work that are
> original to the author, not to the facts themselves. This
> fact/expression dichotomy severely limits the scope of protection in
> fact-based works. Pp. 344-351.

with the body of the opinion suggesting that Rural's white pages listing fell short of this mark being a alphabetic compilation of all phone listings within Rural's service area, as required by Kansas state law.

Among the fallout of this ruling is the database bill of the current legislative session.

As someone with a background in database compilation and analysis, I can't help but wonder whether the court was looking in the right place in determining "originality" and "a modicum of creativity" in Rural's white pages. There are two tricks in compiling a substantatial database:

I'm currently making a good wage validating a third-party's attempt to clean up dirty data in a 100 million record financial accounts database. An error rate of 0.1% could result in more than $100 million in mis-attributed charges, not to mention legal risks for incorrectly reporting or acting on information. Simply compiling data is not sufficient -- the data must be accurate.

Errors can be introduced in many ways -- data may be miskeyed, falsely provided, supersceded, or out of date. Test, training, and system data may enter a production database. Incorrectly classified data may be included in a file.

Cleaning and validating the data alone cost over $1 million, validating the validation has occupied three analysts for over a month. Much of the processing is automated, but manual intervention, interpretation, and further analysis are required. Some of the techniques are quite inventive. Though a small fraction of records are disposed of differently, the value added is tremendous. Our results suggest that cleaning and validation affected 2-5% of records, or up to $5 billion in charges.

In focusing on the data presented in a compilation, rather than the data and errors removed, by concious design, did the Court misplace its attribution of creativity and originality in compiling a database?

To borrow from the Tao Te Ching:

> THE VALUE OF THE UNEXPRESSED.
>
> The thirty spokes join in their nave, that is one; yet the wheel
> dependeth for use upon the hollow place for the axle. Clay is shapen
> to make vessels; but the contained space is what is useful. Matter is
> therefore of use only to mark the limits of the space which is the
> thing of real value.

I'm suggesting that it's the missing data, and the lack of errors, in a strict compilation, which both provide value, constitute the unique attribute, and are the originality of a database.

If data selection, quality assurance, and validation routines are original and creative activities, were they raised in the arguments in this case? I find no indication that they were considered significant in the ruling. If these activities are sufficient to impart originality to the resulting database, shouldn't databases be considered copyrightable, and protected (as a whole) under the existing 1980 Copyright Act?

Is the 1998 Database Act really necessary?

-- 
Karsten M. Self (kmself[_at_]ix.netcom.com)

    What part of "Gestalt" don't you understand?
    Welchen Teil von "Gestalt" verstehen Sie nicht?

web:       http://www.netcom.com/~kmself/
SAS/Linux: http://www.netcom.com/~kmself/SAS/SAS4Linux.html    

  1:31am  up 11 days,  1:20,  6 users,  load average: 1.00, 1.04, 1.11
Received on Sun Sep 27 1998 - 09:55:09 GMT

This archive was generated by hypermail 2.2.0 : Mon Mar 26 2007 - 00:35:32 GMT