Re: search engines destroy western civilization

From: <kmself[_at_]ix.netcom.com>
Date: Fri, 16 Jun 2000 13:16:41 -0700

On Wed, Jun 14, 2000, Keith E. Taber <keith[_at_]drylaw.com> wrote:
>
> On Sun, Jun 11, 2000, Karsten M. Self <kmself[_at_]ix.netcom.com> wrote:
> >
> > On Thu, May 18, 2000, JQ Johnson <jqj[_at_]darkwing.uoregon.edu> wrote:
> > >
> > > On Tue, 16 May 2000, Peter Hirtle <pbh6[_at_]cornell.edu> raised an
> > > important point that I think is worth our revisiting:
> > > >
> > > > If Napster is illegal, wouldn't Google or Lycos (which similarly
> > > > index copyrighted material, some of which may have been posted
> > > > without the owner's permission) also be illegal? How do you
> > > > distinguish intent in this case?
> > >
> > > Under what legal theory can we justify the existence of search
> > > engines as they are currently implemented? Unlike Napster, they
> > > make actual copies of copyrighted materials (web pages), and
> > > distribute them on request to the search engine's users. Although
> > > the original search engines often copied only individual words,
> > > modern search engines often have copies of the complete document
> > > as spidered (that's the Google "show matches" link).
> >
> > Search engines index information which largely (though not exclusively)
> > is being placed on a publicly accessible network, by the authors or
> > rights-holders of the works themselves. The status of Napster copies
> > is rather less clear, and I'd venture to guess that in many cases the
> > rights holder is not the agent placing content online.
> >
> > > In many cases the authors of web pages may have good reasons for
> > > not wanting their pages indexed and mirrored, even if they have
> > > authorized the original web site copy, for example because they
> > > don't want an outdated page available.
> >
> > Cf: AOL in this instance. News sites in particular have many
> > reasons for disliking AOL's caching mechanisms and policies, which
> > often propogate stale content and skew web hit statistics. Of course,
> > so do other mechanisms such as local broswer caching and chaching
> > proxies such as Squid, which I run.
> >
> > > It has been argued in the past that such copying by the search
> > > engine was legal under an implied license theory. However, some
> > > of the pages include explicit language in their text denying such
> > > a license, yet they are copied anyway.
> >
> > Given that web spiders are mechanical rather than semantic agents,
> > this would require a rather complex AI interpretive intelligence to
> > be added to robots.
> >
> > > It has been argued that any web publisher who wants to protect her
> > > pages has the option of including an appropriate META tag or other
> > > technological solution. However, the identities and use of such
> > > tags are clearly not explicitly part of the law, it's not an
> > > approach that is well known to most web authors (I'm a fairly
> > > knowledgeable web publisher, and would have to look it up then
> > > decide which of several competing standards is actually
> > > appropriate), and it may be difficult for an author to implement
> > > depending on the particular publishing technology she uses (for
> > > example, any dependence on robots.txt assumes that the author is
> > > the web server administrator).
> >
> > I'd disagree with this assessment. The information is readily
> > obtainable. As an example, AltaVista includes the following page
> > which describes how to get a website de-listed:
> >
> > http://doc.altavista.com/adv_search/ast_haw_avoiding.shtml
> >
> > It is, intuitively enough <g>, reachable from the "Add a URL" link
> > on the main AltaVista page. It points to definitive information at:
> >
> > http://info.webcrawler.com/mak/projects/robots/robots.html
> >
> > I found both links in about a minute, knowing vaguely that there was
> > some information to be had at AltaVista, and, while tech-savvy, not
> > being a web developer by title or training. Searching "robots
> > exclusion" at Google turns up a number of relevant hits. My own
> > suggestion would be that a professional web developer unaware of
> > these resources or practices is not worth their pay.
> >
> > I suspect the law recognizes conventions which aren't explicit
> > statements of legal usage terms in other areas (eg: paperback
> > books which have been "destroyed" by removal of covers). Anyone
> > have specific practices and information on their legal status?
>
> While the topic of unwanted indexing is important, it seems to me
> that the more accurate analogy to the Napster uproar would be to
> web pages that are not authorized publications in the first place.
> (I believe that is where this thread began ... rehashing won't hurt.)

We've covered this ground, but the distinction is helpful:

    The issue is unicensed placement of works on the web, not what     happens to these works by automated processes once they are put     there.

> Assume I were to digitize and provide online the complete works
> of John Grisham, including every agonizing newspaper or magazine
> interview. Would the search engines be liable for listing my site
> when another avid fan searched for "John Grisham" on Altavista or
> Google? Is the search engine liable for making sure that the
> information on every site that it lists is licensed to or owned by
> the site operator? That is the duty Napster is being charged with.

Consider this: copyright defenders are very likely to use these same search indexes to locate and identify works they are concerned with on the Web. Setting up, running, providing storage for, and maintaining a web index is a fairly considerable operation. Why not use the existing facilities.

I'd also expect that the "big name" search engines represent a smaller target set, and a more risk-averse (and susceptible) population than various independent pirates. It's also frequently the case that big-ticket piracy issues are large-scale black-market operations (eg: software bazaars in Hong Kong and E. Europe), and corporate use of copyrighted materials, eg: the Exxon library photocopying case IIRC 1970s which lead to the formation of the CCC -- Copyright Clearance Center.

> This would require far more than sophisticated AI 'bots and could
> not be solved by any sort of metatag or robots.txt file.

IMO technology isn't the solution here. Applying Lessig's analysis, commerce, law, and moral suasion (society) are more likely to be applicable. The technology is IMO not going to favor strong IP controls, though it can facilitate enforcement through searches.

--
Karsten M. Self <kmself@ix.netcom.com>         http://www.netcom.com/~kmself/
  Evangelist, Opensales, Inc.                       http://www.opensales.org/
   What part of "Gestalt" don't you understand?      Debian GNU/Linux rocks!
     http://gestalt-system.sourceforge.net/      K5: http://www.kuro5hin.org/
GPG fingerprint: F932 8B25 5FDD 2528 D595  DC61 3847 889F 55F2 B9B0
Received on Fri Jun 16 2000 - 20:19:14 GMT

This archive was generated by hypermail 2.2.0 : Mon Mar 26 2007 - 00:35:39 GMT