不想寫python,而且我比較相信awk+uniq+sort的速度
| 1 | awk -v date="$(date '+\%d/\%b/\%Y')" '($4 ~ date && $9 ~ /200/ ){print $1,$7}' /var/log/nginx/access.log | sort | uniq |awk '{print $2}'|sort|uniq -c|sort -r|head -n 50 | 
-  cron的時區很奇怪,不會調,系統時間是+8,tail /var/log/syslog之後發現是西五區,手動+13小時可以通過修改localtime,修改完之後重啟service cron(還是重啟系統好了)參閱
- 
		12rm /etc/localtimecp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
- 第一步取出所有日期是今天且響應是200的請求,打印請求ip和請求uri
- 第二、三步對(ip uri)排序,去重,單個ip訪問單個頁面只會被記為一次
- 打印uri,去重,排序,顯示前50個
time計時
52M日誌,26w條
| 1 2 3 4 5 6 7 8 9 | awk -v date="$(date '+%d/%b/%Y')"  /var/log/nginx/access.log  0.14s user 0.01s system 92% cpu 0.159 total sort  0.00s user 0.00s system 0% cpu 0.156 total uniq -c  0.00s user 0.00s system 0% cpu 0.145 total sort  0.00s user 0.00s system 0% cpu 0.136 total awk '{print $3}'  0.00s user 0.00s system 0% cpu 0.137 total sort  0.00s user 0.00s system 0% cpu 0.125 total uniq -c  0.00s user 0.00s system 0% cpu 0.124 total sort -r  0.00s user 0.00s system 0% cpu 0.114 total head -n 50  0.00s user 0.00s system 0% cpu 0.113 total | 
134M日誌,加上了兩個自定義條件,65w條
| 1 2 3 4 5 6 7 8 | awk '($9 ~ /200/ && $7 !~ /#/){print $1,$7}'   0.70s user 0.09s system 53% cpu 1.457 total sort  0.03s user 0.01s system 2% cpu 1.530 total uniq -c  0.02s user 0.00s system 1% cpu 1.530 total awk '{print $3}'  0.02s user 0.00s system 1% cpu 1.541 total sort  0.02s user 0.00s system 1% cpu 1.563 total uniq -c  0.01s user 0.00s system 0% cpu 1.569 total sort -r  0.01s user 0.00s system 0% cpu 1.568 total head -n 50  0.00s user 0.00s system 0% cpu 1.567 total | 
小站可以考慮
