Check for fake googlebot scrapers
Wed, 10/05/2011 - 12:13 — sandipI noticed a bot scraping using fake GoogleBot useragent string.
Here is a one liner that can detect the IPs to ban:
$ awk 'tolower($0) ~ /googlebot/ {print $1}' /var/www/httpd/access_log | grep -v 66.249.71. | sort | uniq -c | sort -n
It does a case-insensitive awk search for keyword "googlebot" from apache log file removing IPs with "66.249.71." which belongs to google and prints the output in a sorted hit count.
You can validate the IPs with:
IP=66.249.71.37 ; reverse=$(dig -x $IP +short | grep googlebot.com) ; ip=$(dig $reverse +short) ; [ "$IP" = "$ip" ] && echo $IP GOOD || echo $IP FAKE
Replace the IP value with the one you want to check.
- sandip's blog
- Login or register to post comments
- Read more
ssh keygen RSA versus DSA
Fri, 05/06/2011 - 10:55 — sandipWhile generating ssh keys, I usually use RSA type since it can be used to generate 2048 bits key, while DSA is restricted to exactly 1024 bits.
ssh-keygen -t rsa -b 2048
- sandip's blog
- Login or register to post comments
- Read more
Week of Month
Thu, 01/08/2009 - 10:24 — sandipHere is a simple one liner to get the week of month via awk from a `cal` output:
$ cal | awk -v date="`date +%d`" '{ for( i=1; i <= NF ; i++ ) if ($i==date) { print FNR-2} }'
highlight grep search string
Fri, 12/19/2008 - 15:09 — sandipDefault to highlighting search string when using grep by adding the below alias to ~/.bashrc file:
alias grep='grep --color=auto'
- sandip's blog
- Login or register to post comments
analog filesize limit
Tue, 09/16/2008 - 16:05 — sandipI had some trouble with analog monthly stats not showing up for the last week and figured out that analog refuses to parse huge log files. I had one sitting at 3GB without being rotated and analog would error out with:
/usr/bin/analog: Warning F: Failed to open logfile
/var/log/httpd/access_log: ignoring it
After running gzip on the log file, analog was then able to produce the reports. I think I read somewhere that the limit may be 2GB but have not tested this.
- sandip's blog
- Login or register to post comments