使用TCP/IP协议栈指纹进行远程操作系统辨识 主动识别、被动识别

在做亚马逊爬虫的时候,亚马逊的屏蔽规则让人费解,传统的模拟浏览器请求header、cookie,换IP对亚马逊反爬虫策略并不能完全解释清楚,还存在其他的反爬虫策略,因为亚马逊并不会完全封禁IP,隔断时间会被解封,这样将牺牲一部分用户群体。

困惑产生原因:

1.相同IP、同样的抓取方式,在linux操作系统下面抓取数据已经被封闭,换成windows操作系统时却可以正常抓取数据

2.linux操作系统,通过docker 安装centos ubuntu 蝶变 等操作系统及不同版本,采用相同抓取方式,别封禁的情况截然不同,有些正常抓取,有些被封了,他们的出网ip相同,为什么会存在这种情况?

猜想:难道亚马逊可以识别到服务器与docker容器里面的网卡MAC地址?亦或者能识别我们的操作系统类型及版本号?

最开始错误思虑:http请求时,伪造 User-Agent:windows操作系统,他应该识别到的只能是windows操作系统呀!

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36 

当然这个问题被放了,没有再做过多的思考,毕竟对当时TCP/IP传输协议了解甚少

。。。。。。。。

几个月后

心血来潮

查阅很多资料

发现

协议栈指纹

协议栈指纹识别是一项强大的技术,能够以很高的概率迅速确定操作系统的版本。虽然TCP/IP协议栈的定义已经成为一项标准,但是各个厂家,如微软和RedHat等在编写自己的TCP/IP协议栈时,却做出了不同的解释。这些解释因具有独一无二的特性,故被称为“指纹”。通过这些细微的差别,可以准确定位操作系统的版本。

TCP/IP堆栈指纹识别作为一种识别准确率很高的技术,被广泛1653运用于nmaP,p0f等著名安全检测工具中。TCP/IP堆栈指纹识别分为两种,即主动识别和被动识别。

p0f 被动识别工具

#安装
yum install p0f
[root@izwz9bb1rjtnk ~]# p0f -h
--- p0f 3.09b by Michal Zalewski <lcamtuf@coredump.cx> ---

p0f: invalid option -- 'h'
Usage: p0f [ ...options... ] [ 'filter rule' ]

Network interface options:

  -i iface  - listen on the specified network interface
  -r file   - read offline pcap data from a given file
  -p        - put the listening interface in promiscuous mode
  -L        - list all available interfaces

Operating mode and output settings:

  -f file   - read fingerprint database from 'file' (/etc/p0f/p0f.fp)
  -o file   - write information to the specified log file
  -s name   - answer to API queries at a named unix socket
  -u user   - switch to the specified unprivileged account and chroot
  -d        - fork into background (requires -o or -s)

Performance-related options:

  -S limit  - limit number of parallel API connections (20)
  -t c,h    - set connection / host cache age limits (30s,120m)
  -m c,h    - cap the number of active connections / hosts (1000,10000)

Optional filter expressions (man tcpdump) can be specified in the command
line to prevent p0f from looking at incidental network traffic.

Problems? You can reach the author at <lcamtuf@coredump.cx>.
监听 eth0 网卡 443端口 将日志写入p0f3.log
p0f -f /etc/p0f/p0f.fp -o ./p0f3.log -i eth0  'port 443'
#输出
.-[ 172.18.37.42/53464 -> 163.177.83.164/443 (syn) ]-
|
| client   = 172.18.37.42/53464
| os       = Linux 3.11 and newer
| dist     = 0
| params   = none
| raw_sig  = 4:64+0:0:1460:mss*20,7:mss,sok,ts,nop,ws:df,id+:0
|
`----

.-[ 172.18.37.42/53464 -> 163.177.83.164/443 (mtu) ]-
|
| client   = 172.18.37.42/53464
| link     = Ethernet or modem
| raw_mtu  = 1500
|
`----

.-[ 172.18.37.42/53464 -> 163.177.83.164/443 (syn+ack) ]-
|
| server   = 163.177.83.164/443
| os       = Linux 3.x
| dist     = 12
| params   = tos:0x05
| raw_sig  = 4:52+12:0:1440:mss*10,7:mss,nop,nop,sok,nop,ws:df:0
|
`----

.-[ 172.18.37.42/53464 -> 163.177.83.164/443 (mtu) ]-
|
| server   = 163.177.83.164/443
| link     = IPIP or SIT
| raw_mtu  = 1480
|
`----

.-[ 222.131.36.189/50664 -> 172.18.37.42/443 (syn) ]-
|
| client   = 222.131.36.189/50664
| os       = Mac OS X
| dist     = 12
| params   = generic fuzzy tos:0x05
| raw_sig  = 4:52+12:0:1420:65535,7:mss,nop,ws,nop,nop,ts,sok,eol+1:df,ecn:0
|
`----

发送http请求到测试服务器,虽然User-Agent伪装windows操作系统,但是通过栈指纹还是可以识别到请求操作系统类型,感觉像是掩耳盗铃,所以通过栈指纹来反爬虫却成为一件很容易的事情。这是亚马逊反爬虫的策略之一。

主动栈指纹识别

#安装
yum install nmap
#帮助文档
[root@root ~]# nmap -h
Nmap 6.40 ( http://nmap.org )
Usage: nmap [Scan Type(s)] [Options] {target specification}
TARGET SPECIFICATION:
  Can pass hostnames, IP addresses, networks, etc.
  Ex: scanme.nmap.org, microsoft.com/24, 192.168.0.1; 10.0.0-255.1-254
  -iL <inputfilename>: Input from list of hosts/networks
  -iR <num hosts>: Choose random targets
  --exclude <host1[,host2][,host3],...>: Exclude hosts/networks
  --excludefile <exclude_file>: Exclude list from file
HOST DISCOVERY:
  -sL: List Scan - simply list targets to scan
  -sn: Ping Scan - disable port scan
  -Pn: Treat all hosts as online -- skip host discovery
  -PS/PA/PU/PY[portlist]: TCP SYN/ACK, UDP or SCTP discovery to given ports
  -PE/PP/PM: ICMP echo, timestamp, and netmask request discovery probes
  -PO[protocol list]: IP Protocol Ping
  -n/-R: Never do DNS resolution/Always resolve [default: sometimes]
  --dns-servers <serv1[,serv2],...>: Specify custom DNS servers
  --system-dns: Use OS's DNS resolver
  --traceroute: Trace hop path to each host
SCAN TECHNIQUES:
  -sS/sT/sA/sW/sM: TCP SYN/Connect()/ACK/Window/Maimon scans
  -sU: UDP Scan
  -sN/sF/sX: TCP Null, FIN, and Xmas scans
  --scanflags <flags>: Customize TCP scan flags
  -sI <zombie host[:probeport]>: Idle scan
  -sY/sZ: SCTP INIT/COOKIE-ECHO scans
  -sO: IP protocol scan
  -b <FTP relay host>: FTP bounce scan
PORT SPECIFICATION AND SCAN ORDER:
  -p <port ranges>: Only scan specified ports
    Ex: -p22; -p1-65535; -p U:53,111,137,T:21-25,80,139,8080,S:9
  -F: Fast mode - Scan fewer ports than the default scan
  -r: Scan ports consecutively - don't randomize
  --top-ports <number>: Scan <number> most common ports
  --port-ratio <ratio>: Scan ports more common than <ratio>
SERVICE/VERSION DETECTION:
  -sV: Probe open ports to determine service/version info
  --version-intensity <level>: Set from 0 (light) to 9 (try all probes)
  --version-light: Limit to most likely probes (intensity 2)
  --version-all: Try every single probe (intensity 9)
  --version-trace: Show detailed version scan activity (for debugging)
SCRIPT SCAN:
  -sC: equivalent to --script=default
  --script=<Lua scripts>: <Lua scripts> is a comma separated list of 
           directories, script-files or script-categories
  --script-args=<n1=v1,[n2=v2,...]>: provide arguments to scripts
  --script-args-file=filename: provide NSE script args in a file
  --script-trace: Show all data sent and received
  --script-updatedb: Update the script database.
  --script-help=<Lua scripts>: Show help about scripts.
           <Lua scripts> is a comma separted list of script-files or
           script-categories.
OS DETECTION:
  -O: Enable OS detection
  --osscan-limit: Limit OS detection to promising targets
  --osscan-guess: Guess OS more aggressively
TIMING AND PERFORMANCE:
  Options which take <time> are in seconds, or append 'ms' (milliseconds),
  's' (seconds), 'm' (minutes), or 'h' (hours) to the value (e.g. 30m).
  -T<0-5>: Set timing template (higher is faster)
  --min-hostgroup/max-hostgroup <size>: Parallel host scan group sizes
  --min-parallelism/max-parallelism <numprobes>: Probe parallelization
  --min-rtt-timeout/max-rtt-timeout/initial-rtt-timeout <time>: Specifies
      probe round trip time.
  --max-retries <tries>: Caps number of port scan probe retransmissions.
  --host-timeout <time>: Give up on target after this long
  --scan-delay/--max-scan-delay <time>: Adjust delay between probes
  --min-rate <number>: Send packets no slower than <number> per second
  --max-rate <number>: Send packets no faster than <number> per second
FIREWALL/IDS EVASION AND SPOOFING:
  -f; --mtu <val>: fragment packets (optionally w/given MTU)
  -D <decoy1,decoy2[,ME],...>: Cloak a scan with decoys
  -S <IP_Address>: Spoof source address
  -e <iface>: Use specified interface
  -g/--source-port <portnum>: Use given port number
  --data-length <num>: Append random data to sent packets
  --ip-options <options>: Send packets with specified ip options
  --ttl <val>: Set IP time-to-live field
  --spoof-mac <mac address/prefix/vendor name>: Spoof your MAC address
  --badsum: Send packets with a bogus TCP/UDP/SCTP checksum
OUTPUT:
  -oN/-oX/-oS/-oG <file>: Output scan in normal, XML, s|<rIpt kIddi3,
     and Grepable format, respectively, to the given filename.
  -oA <basename>: Output in the three major formats at once
  -v: Increase verbosity level (use -vv or more for greater effect)
  -d: Increase debugging level (use -dd or more for greater effect)
  --reason: Display the reason a port is in a particular state
  --open: Only show open (or possibly open) ports
  --packet-trace: Show all packets sent and received
  --iflist: Print host interfaces and routes (for debugging)
  --log-errors: Log errors/warnings to the normal-format output file
  --append-output: Append to rather than clobber specified output files
  --resume <filename>: Resume an aborted scan
  --stylesheet <path/URL>: XSL stylesheet to transform XML output to HTML
  --webxml: Reference stylesheet from Nmap.Org for more portable XML
  --no-stylesheet: Prevent associating of XSL stylesheet w/XML output
MISC:
  -6: Enable IPv6 scanning
  -A: Enable OS detection, version detection, script scanning, and traceroute
  --datadir <dirname>: Specify custom Nmap data file location
  --send-eth/--send-ip: Send using raw ethernet frames or IP packets
  --privileged: Assume that the user is fully privileged
  --unprivileged: Assume the user lacks raw socket privileges
  -V: Print version number
  -h: Print this help summary page.
EXAMPLES:
  nmap -v -A scanme.nmap.org
  nmap -v -sn 192.168.0.0/16 10.0.0.0/8
  nmap -v -iR 10000 -Pn -p 80
SEE THE MAN PAGE (http://nmap.org/book/man.html) FOR MORE OPTIONS AND EXAMPLES
root@MHAnode04:~# nmap -O 50.2.83.130

Starting Nmap 7.01 ( https://nmap.org ) at 2020-06-04 04:11 EDT
Nmap scan report for 50.2.83.130
Host is up (0.15s latency).
Not shown: 999 closed ports
PORT   STATE SERVICE
22/tcp open  ssh
Aggressive OS guesses: Linux 2.6.32 - 3.13 (96%), Linux 3.2 - 4.0 (94%), Linux 2.6.32 - 3.10 (93%), HP P2000 G3 NAS device (93%), Ubiquiti AirMax NanoStation WAP (Linux 2.6.32) (92%), Linux 2.6.32 (92%), Linux 3.7 (92%), Infomir MAG-250 set-top box (92%), Linux 2.6.23 - 2.6.38 (91%), Linux 2.6.32 - 3.1 (91%)
No exact OS matches for host (test conditions non-ideal).
Network Distance: 18 hops

OS detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 9.81 seconds

产考资料:

https://blog.csdn.net/he_and/article/details/88350861
https://blog.csdn.net/freeking101/article/details/72962349
https://www.ixueshu.com/document/977e456638c1f9cf.html
https://www.doc88.com/p-8846033523821.html
https://wenku.baidu.com/view/6bdb6c2bff4733687e21af45b307e87100f6f878.html
https://blog.csdn.net/whatday/article/details/105517801
https://wenku.baidu.com/view/c6711182e53a580216fcfe75.html
http://shouce.jb51.net/kali-linux-tutorial/21.html
https://baike.baidu.com/item/%E5%8D%8F%E8%AE%AE%E6%A0%88%E6%8C%87%E7%BA%B9/7113052?fr=aladdin
https://blog.csdn.net/freeking101/article/details/72962349