Skip to content →

Python3中通过IP形式建立HTTPs连接

最近,在验证特定网站是否部署特定CDN的实验中,遇到了一个问题:当该网站只支持HTTPs访问时,如何通过IP形式而非域名形式建立连接。

  • 通过域名建立连接

方法很简单也很多,在Python3中你可以选择使用socket、urllib3、requests或者http.client库。我一般使用requests,因为requests和urllib3遇到30x响应都会自动跟随(该功能可以关闭,前者通过参数allow_redirects设置,后者通过参数redirect设置)。例如:

#!/usr/bin/env python3.5.2
# -*- coding:utf-8 -*-

import requests

domain = 'liwz11.com'
headers = { 'user-agent' : 'Python 3.x' }

r = requests.get('http://' + domain + '/', headers=headers)
print(r.status_code)

r = requests.get('https://' + domain + '/', headers=headers)
print(r.status_code)

  • 通过IP建立连接

如果网站支持HTTP(80端口),那么直接通过IP建立连接即可:

#!/usr/bin/env python3.5.2
# -*- coding:utf-8 -*-

import requests

domain = 'liwz11.com'
ip = '104.27.176.173'
headers = { 'user-agent' : 'Python 3.x', 'host': domain }

r  = requests.get('http://' + ip + '/', headers=headers)
print(r.status_code)

如果目标站点只支持HTTPs(443端口),那么上述代码就不能简单的把”http”替换成”https”了,否则会报错。因为在SSL握手过程,会校验当前请求URL的server_hostname是否在服务端证书的可选域名列表里。例如,服务端证书的可选域名列表中包含”liwz11.com”,通过域名形式建立连接可以成功,因为server_hostname被自动设置为域名即”liwz11.com”;而通过IP形式建立连接会访问失败,因为server_hostname被自动设置为IP即”104.27.176.173″,不在可选域名列表中,导致TLS层在进行证书的server_hostname校验时失败,最终连接建立失败。

很显然,需要在进行SSL握手之前需要指定server_hostname。只能阅读源码,了解一下上述提到的4个Python库是否提供对应的接口。

1. http.client库

查看 http.client源码

# ......Lib/http/client.py

class HTTPConnection:
    def __init__(self, host, port=None, ...):
        (self.host, self.port) = self._get_hostport(host, port)
    def connect(self):
        self.sock = self._create_connection((self.host,self.port), self.timeout, self.source_address)
        self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
        if self._tunnel_host:
            self._tunnel()
    def send(self, data):
        if self.sock is None:
            self.connect()

class HTTPSConnection(HTTPConnection):
    def connect(self):
        "Connect to a host on a given (SSL) port."
        super().connect()
        if self._tunnel_host:
            server_hostname = self._tunnel_host
        else:
            server_hostname = self.host
        self.sock = self._context.wrap_socket(self.sock, server_hostname=server_hostname)

可以看到,在SSL握手之前,http.client库代码将server_hostname赋值为连接的host成员变量值:如果通过域名形式建立连接,那么该host值为域名;如果通过IP形式建立连接,那么该host值为IP。因此可以自定义一个类继承HTTPSConnection,覆盖connect()函数,使其在SSL握手之前,将server_hostname服赋值为我们传入的参数值。最终代码如下:

#!/usr/bin/env python3.5.2
# -*- coding:utf-8 -*-

import http.client
import socket

class MyHTTPSConnection(http.client.HTTPSConnection):
    def __init__(self, *args, server_hostname=None, **kwargs):
        self.server_hostname = server_hostname
        http.client.HTTPSConnection.__init__(self, *args, **kwargs)
    def connect(self):
        self.sock = self._create_connection((self.host,self.port), self.timeout, self.source_address)
        self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
        self.sock = self._context.wrap_socket(self.sock, server_hostname=self.server_hostname)

domain = 'liwz11.com'
ip = '104.27.176.173'
headers = { 'user-agent' : 'Python 3.x', 'host' : domain }

conn = MyHTTPSConnection(ip, server_hostname=domain)
conn.request("GET", "/", headers=headers)
r = conn.getresponse()

print(r.status)

需要注意的是,不同于requests库,通过http.client库发送Web请求,遇到30x响应不会自动跟随。

2. urllib3库

官方文档给的Example如下:

#!/usr/bin/env python3.5.2
# -*- coding:utf-8 -*-

import urllib3

domain = 'liwz11.com'
headers = { 'user-agent' : 'Python 3.x'}

pool = urllib3.PoolManager()
r = pool.request('GET', 'https://' + domain + '/', headers=headers)
print(r.status)

可以通过PoolManager类去查看 urllib3源码

# ......src/urllib3/poolmanager.py
class PoolManager(RequestMethods):
    def __init__(self, num_pools=10, headers=None, **connection_pool_kw):
        RequestMethods.__init__(self, headers)
        self.connection_pool_kw = connection_pool_kw
    def urlopen(self, method, url, redirect=True, **kw):
        u = parse_url(url)
        conn = self.connection_from_host(u.host, port=u.port, scheme=u.scheme)
        response = conn.urlopen(method, u.request_uri, **kw)
        redirect_location = redirect and response.get_redirect_location()
        ......
        if not redirect_location:
            return response
        else:
            return self.urlopen(method, redirect_location, **kw)
    def connection_from_host(self, host, port=None, ...):
        return self.connection_from_context(request_context)
    def connection_from_context(self, request_context):
        return self.connection_from_pool_key(...)
    def connection_from_pool_key(self, ...):
        pool = self._new_pool(scheme, host, port, request_context)
        return pool
    def _new_pool(self, scheme, host, port, request_context=None):
        # {"http": HTTPConnectionPool, "https": HTTPSConnectionPool}
        pool_cls = self.pool_classes_by_scheme[scheme]
        return pool_cls(host, port, **request_context)


# ......src/urllib3/request.py
class RequestMethods(object):
    def urlopen(self, method, url, ...) # Abstract
    def request(self, method, url, ...):
        self.request_encode_url(self, method, url, ...)
    def request_encode_url(self, method, url, ...):
        return self.urlopen(method, url, ...)

# ......src/urllib3/connectionpool.py
class HTTPSConnectionPool(HTTPConnectionPool):
    scheme = "https"
    ConnectionCls = HTTPSConnection
    def urlopen(self, method, url, ...):
        conn = self._get_conn(timeout=pool_timeout)

# ......src/urllib3/connection.py
class HTTPSConnection(HTTPConnection):
    def __init__(self, host, port=None, ..., server_hostname=None, **kw):
        self.server_hostname = server_hostname
    def connect(self):
        conn = self._new_conn()
        hostname = self.host
        server_hostname = hostname
        if self.server_hostname is not None:
            server_hostname = self.server_hostname
        self.sock = ssl_wrap_socket(sock=conn, ..., server_hostname=server_hostname)

可以看到,类HTTPSConnection允许设置server_hostname,经过HTTPSConnectionPool和PoolManager进一步封装之后,可以通过参数传入。最终代码如下:

#!/usr/bin/env python3.5.2
# -*- coding:utf-8 -*-

import urllib3

domain = 'liwz11.com'
ip = '104.27.176.173'
headers = { 'user-agent' : 'Python 3.x', 'host' : domain }

pool = urllib3.PoolManager(server_hostname=domain)
r = pool.request('GET', 'https://' + ip + '/', headers=headers)

print(r.status)

使用urllib3库设置server_hostname非常方便简洁,而且urllib3遇到30x响应会自动跟随。

需要注意的是,低版本的urllib3中找不到设置server_hostname的代码,例如默认安装环境Python3.5.2+urllib3(1.22)和Python3.6.8+urllib3(1.22)都没有。因此,上述代码要求将urllib3升级到最新版本,例如:

who@ubuntu:~/Desktop$ pip3 list | grep urllib3
urllib3 (1.22)

who@ubuntu:~/Desktop$ sudo pip3 install urllib3 --upgrade
......
Successfully installed urllib3-1.25.8

who@ubuntu:~/Desktop$ pip3 list | grep urllib3
urllib3 (1.25.8)

3. requests库

阅读 requests源码 没有找到方法,但是网上有人讨论过 类似问题 并提供了如下代码,将requests库更新到2.23.0和urllib3库更新到1.25.8之后,此代码可行。

#!/usr/bin/env python3.5.2
# -*- coding:utf-8 -*-

import requests, urllib3

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

class CustomAdapter(requests.adapters.HTTPAdapter):
    def __init__(self, server_hostname, *args, **kwargs):
        self.server_hostname = server_hostname
        requests.adapters.HTTPAdapter.__init__(self, *args, **kwargs)
    def init_poolmanager(self, *args, **kwargs):
        self.poolmanager = requests.adapters.PoolManager(*args, server_hostname=self.server_hostname, **kwargs)

domain = 'liwz11.com'
ip = '104.27.176.173'
headers = { 'user-agent' : 'Python 3.x', 'host' : domain }

s = requests.Session()
s.mount('https://' + ip + '/', CustomAdapter(server_hostname=domain))
#r = s.request('GET', 'https://' + ip + '/', headers=headers, stream=True)
r = s.request('GET', 'https://' + ip + '/', headers=headers, stream=True, verify=Flase)

fp = r.raw._fp.fp
sock = fp.raw._sock if hasattr(fp, 'raw') else fp._sock
remote_ip = sock.getpeername()[0]

print(remote_ip)
print(r.status_code)

使用requests库的另一个好处就是,可以设置stream为True,只下载响应头不下载响应体,从而使TCP连接保持,可用于获取服务器IP地址。

4. socket库

使用socket方式应该是可以做到的,方法类似http.client,但是对http消息的封装全部需要自己实现,所以我没有去尝试。

 

Published in 未分类