最近,在验证特定网站是否部署特定CDN的实验中,遇到了一个问题:当该网站只支持HTTPs访问时,如何通过IP形式而非域名形式建立连接。
- 通过域名建立连接
方法很简单也很多,在Python3中你可以选择使用socket、urllib3、requests或者http.client库。我一般使用requests,因为requests和urllib3遇到30x响应都会自动跟随(该功能可以关闭,前者通过参数allow_redirects设置,后者通过参数redirect设置)。例如:
#!/usr/bin/env python3.5.2 # -*- coding:utf-8 -*- import requests domain = 'liwz11.com' headers = { 'user-agent' : 'Python 3.x' } r = requests.get('http://' + domain + '/', headers=headers) print(r.status_code) r = requests.get('https://' + domain + '/', headers=headers) print(r.status_code)
- 通过IP建立连接
如果网站支持HTTP(80端口),那么直接通过IP建立连接即可:
#!/usr/bin/env python3.5.2
# -*- coding:utf-8 -*-
import requests
domain = 'liwz11.com'
ip = '104.27.176.173'
headers = { 'user-agent' : 'Python 3.x', 'host': domain }
r = requests.get('http://' + ip + '/', headers=headers)
print(r.status_code)
如果目标站点只支持HTTPs(443端口),那么上述代码就不能简单的把”http”替换成”https”了,否则会报错。因为在SSL握手过程,会校验当前请求URL的server_hostname是否在服务端证书的可选域名列表里。例如,服务端证书的可选域名列表中包含”liwz11.com”,通过域名形式建立连接可以成功,因为server_hostname被自动设置为域名即”liwz11.com”;而通过IP形式建立连接会访问失败,因为server_hostname被自动设置为IP即”104.27.176.173″,不在可选域名列表中,导致TLS层在进行证书的server_hostname校验时失败,最终连接建立失败。
很显然,需要在进行SSL握手之前需要指定server_hostname。只能阅读源码,了解一下上述提到的4个Python库是否提供对应的接口。
1. http.client库
查看 http.client源码 :
# ......Lib/http/client.py class HTTPConnection: def __init__(self, host, port=None, ...): (self.host, self.port) = self._get_hostport(host, port) def connect(self): self.sock = self._create_connection((self.host,self.port), self.timeout, self.source_address) self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) if self._tunnel_host: self._tunnel() def send(self, data): if self.sock is None: self.connect() class HTTPSConnection(HTTPConnection): def connect(self): "Connect to a host on a given (SSL) port." super().connect() if self._tunnel_host: server_hostname = self._tunnel_host else: server_hostname = self.host self.sock = self._context.wrap_socket(self.sock, server_hostname=server_hostname)
可以看到,在SSL握手之前,http.client库代码将server_hostname赋值为连接的host成员变量值:如果通过域名形式建立连接,那么该host值为域名;如果通过IP形式建立连接,那么该host值为IP。因此可以自定义一个类继承HTTPSConnection,覆盖connect()函数,使其在SSL握手之前,将server_hostname服赋值为我们传入的参数值。最终代码如下:
#!/usr/bin/env python3.5.2 # -*- coding:utf-8 -*- import http.client import socket class MyHTTPSConnection(http.client.HTTPSConnection): def __init__(self, *args, server_hostname=None, **kwargs): self.server_hostname = server_hostname http.client.HTTPSConnection.__init__(self, *args, **kwargs) def connect(self): self.sock = self._create_connection((self.host,self.port), self.timeout, self.source_address) self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) self.sock = self._context.wrap_socket(self.sock, server_hostname=self.server_hostname) domain = 'liwz11.com' ip = '104.27.176.173' headers = { 'user-agent' : 'Python 3.x', 'host' : domain } conn = MyHTTPSConnection(ip, server_hostname=domain) conn.request("GET", "/", headers=headers) r = conn.getresponse() print(r.status)
需要注意的是,不同于requests库,通过http.client库发送Web请求,遇到30x响应不会自动跟随。
2. urllib3库
官方文档给的Example如下:
#!/usr/bin/env python3.5.2 # -*- coding:utf-8 -*- import urllib3 domain = 'liwz11.com' headers = { 'user-agent' : 'Python 3.x'} pool = urllib3.PoolManager() r = pool.request('GET', 'https://' + domain + '/', headers=headers) print(r.status)
可以通过PoolManager类去查看 urllib3源码 :
# ......src/urllib3/poolmanager.py class PoolManager(RequestMethods): def __init__(self, num_pools=10, headers=None, **connection_pool_kw): RequestMethods.__init__(self, headers) self.connection_pool_kw = connection_pool_kw def urlopen(self, method, url, redirect=True, **kw): u = parse_url(url) conn = self.connection_from_host(u.host, port=u.port, scheme=u.scheme) response = conn.urlopen(method, u.request_uri, **kw) redirect_location = redirect and response.get_redirect_location() ...... if not redirect_location: return response else: return self.urlopen(method, redirect_location, **kw) def connection_from_host(self, host, port=None, ...): return self.connection_from_context(request_context) def connection_from_context(self, request_context): return self.connection_from_pool_key(...) def connection_from_pool_key(self, ...): pool = self._new_pool(scheme, host, port, request_context) return pool def _new_pool(self, scheme, host, port, request_context=None): # {"http": HTTPConnectionPool, "https": HTTPSConnectionPool} pool_cls = self.pool_classes_by_scheme[scheme] return pool_cls(host, port, **request_context) # ......src/urllib3/request.py class RequestMethods(object): def urlopen(self, method, url, ...) # Abstract def request(self, method, url, ...): self.request_encode_url(self, method, url, ...) def request_encode_url(self, method, url, ...): return self.urlopen(method, url, ...) # ......src/urllib3/connectionpool.py class HTTPSConnectionPool(HTTPConnectionPool): scheme = "https" ConnectionCls = HTTPSConnection def urlopen(self, method, url, ...): conn = self._get_conn(timeout=pool_timeout) # ......src/urllib3/connection.py class HTTPSConnection(HTTPConnection): def __init__(self, host, port=None, ..., server_hostname=None, **kw): self.server_hostname = server_hostname def connect(self): conn = self._new_conn() hostname = self.host server_hostname = hostname if self.server_hostname is not None: server_hostname = self.server_hostname self.sock = ssl_wrap_socket(sock=conn, ..., server_hostname=server_hostname)
可以看到,类HTTPSConnection允许设置server_hostname,经过HTTPSConnectionPool和PoolManager进一步封装之后,可以通过参数传入。最终代码如下:
#!/usr/bin/env python3.5.2 # -*- coding:utf-8 -*- import urllib3 domain = 'liwz11.com' ip = '104.27.176.173' headers = { 'user-agent' : 'Python 3.x', 'host' : domain } pool = urllib3.PoolManager(server_hostname=domain) r = pool.request('GET', 'https://' + ip + '/', headers=headers) print(r.status)
使用urllib3库设置server_hostname非常方便简洁,而且urllib3遇到30x响应会自动跟随。
需要注意的是,低版本的urllib3中找不到设置server_hostname的代码,例如默认安装环境Python3.5.2+urllib3(1.22)和Python3.6.8+urllib3(1.22)都没有。因此,上述代码要求将urllib3升级到最新版本,例如:
who@ubuntu:~/Desktop$ pip3 list | grep urllib3 urllib3 (1.22) who@ubuntu:~/Desktop$ sudo pip3 install urllib3 --upgrade ...... Successfully installed urllib3-1.25.8 who@ubuntu:~/Desktop$ pip3 list | grep urllib3 urllib3 (1.25.8)
3. requests库
阅读 requests源码 没有找到方法,但是网上有人讨论过 类似问题 并提供了如下代码,将requests库更新到2.23.0和urllib3库更新到1.25.8之后,此代码可行。
#!/usr/bin/env python3.5.2 # -*- coding:utf-8 -*- import requests, urllib3 urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) class CustomAdapter(requests.adapters.HTTPAdapter): def __init__(self, server_hostname, *args, **kwargs): self.server_hostname = server_hostname requests.adapters.HTTPAdapter.__init__(self, *args, **kwargs) def init_poolmanager(self, *args, **kwargs): self.poolmanager = requests.adapters.PoolManager(*args, server_hostname=self.server_hostname, **kwargs) domain = 'liwz11.com' ip = '104.27.176.173' headers = { 'user-agent' : 'Python 3.x', 'host' : domain } s = requests.Session() s.mount('https://' + ip + '/', CustomAdapter(server_hostname=domain)) #r = s.request('GET', 'https://' + ip + '/', headers=headers, stream=True) r = s.request('GET', 'https://' + ip + '/', headers=headers, stream=True, verify=Flase) fp = r.raw._fp.fp sock = fp.raw._sock if hasattr(fp, 'raw') else fp._sock remote_ip = sock.getpeername()[0] print(remote_ip) print(r.status_code)
使用requests库的另一个好处就是,可以设置stream为True,只下载响应头不下载响应体,从而使TCP连接保持,可用于获取服务器IP地址。
4. socket库
使用socket方式应该是可以做到的,方法类似http.client,但是对http消息的封装全部需要自己实现,所以我没有去尝试。