webA great list of 50 Open Source Web Crawlers has been produced by Baiju NT on a Big Data Blog

Web Crawlers are useful in gathering data from other sites when performing research, although caution should be used as with today’s levels of protection some sites defenses may consider your data gathering as an attack.

Its probably best to check first if any data sets exist with the data you are looking for.

https://www.quandl.com/ is a search engine for data sets that has listed 12 million data sets.

There are lots of data sets available from governments such as http://data.gov.uk/ in the UK.

If its a smaller list of good data sources is needed have a look at http://www.kdnuggets.com/datasets/index.html