information-retrieval

Exploration of information retrieval topics
git clone git://git.laack.co/information-retrieval.git
Log | Files | Refs

README.md (369B)


      1 # Crawling
      2 
      3 The purpose of this directory is to facilitate web crawling. This directory should suffice to setup your crawler and your queueing database.
      4 
      5 ## DB
      6 
      7 The backing database for queueing is postgresql for consistency. The database name is 'crawling'.
      8 
      9 
     10 ### Schema
     11 
     12 crawling:
     13     - queued_site(url, creation_timestamp, status, claimed_at, crawl_requests, depth)