Hi!
It's my understanding that using a web-spider such as wget is frowned upon because it can place a large load on the server and it is more efficient to download the entire database.
However, my application does not warrant pulling down the database. I'd be interested in pulling 50 articles and each article that those 50 articles link to. I'm guessing that this would be on the order of 10-20K articles, well-below the size of the entire Wikipedia.
Therefore, would it be tolerated if I used wget or a similar spider under the condition that I ran it at a rate of only one page a second? Would this be broadly tolerated? It seems like all of the anti-spider notes are about people trying to download all of Wikipedia at 50 pages/sec.
Hi!
It's my understanding that using a web-spider such as wget is frowned upon because it can place a large load on the server and it is more efficient to download the entire database.
However, my application does not warrant pulling down the database. I'd be interested in pulling 50 articles and each article that those 50 articles link to. I'm guessing that this would be on the order of 10-20K articles, well-below the size of the entire Wikipedia.
Therefore, would it be tolerated if I used wget or a similar spider under the condition that I ran it at a rate of only one page a second? Would this be broadly tolerated? It seems like all of the anti-spider notes are about people trying to download all of Wikipedia at 50 pages/sec.