StevenPZChan
StevenPZChan
7 min read

Categories

Tags

继续挑战,难度开始上升了


第17题地址romance.html

  • cookies.jpg
  • 网页标题是eat?,题目内容为空,源码也没有隐藏信息

没什么说的,肯定是从读图开始。
图片上是一些小饼干,看看图片名字叫——cookies。嗯?cookiesCookies

Cookie(复数形态Cookies),又称为“小甜饼”。类型为“小型文本文件”,指某些网站为了辨别用户身份而储存在用户本地终端(Client Side)上的数据(通常经过加密)。由网景公司的前雇员卢·蒙特利在1993年3月发明。最初定义于RFC 2109。当前使用最广泛的 Cookie标准却不是RFC中定义的任何一个,而是在网景公司制定的标准上进行扩展后的产物。

From wikipedia.org

题目的cookies是不是这个意思呢?我们来看一看:

import requests

with requests.Session() as sess:
    sess.auth = ('huge', 'file')
    response = sess.get('http://www.pythonchallenge.com/pc/return/romance.html')
    print(response.cookies)
<RequestsCookieJar[]>

服务器并没有给我们返回Cookies啊!


我们继续看图,发现左下角还有一个小图,看上去有似曾相识的感觉。
翻了翻前面的题目发现,这是第4题linkedlist.php的图片缩小版。
难道这里面藏了东西?我们来看一看:

response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php')
print(response.cookies)
<RequestsCookieJar[<Cookie info=you+should+have+followed+busynothing... for .pythonchallenge.com/>]>

果然藏着Cookies
服务器返回的信息是(+在html里是空格):

you should have followed busynothing…

这是什么意思呢?
让我们来回顾一下第4题,是使用GET方法将nothing设置成一定值来向服务器请求下一个nothing及相关信息的。
那这句话的意思是不是要我们换成busynothing来请求?

response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=12345')
print(response.text)
If you came here from level 4 - go back!<br>You should follow the obvious chain...<br><br>and the next busynothing is 44827
response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=44827')
print(response.text)
and the next busynothing is 45439

看来思路没错,我们按第4题思路重来一遍:

import re
import requests

url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing='
nothing = '12345'
with requests.Session() as sess:
    for i in range(400):
        response = sess.get(url + nothing).text
        result = re.findall(r'next busynothing is (\d+)', response)
        if not result:
            print('busynothing =', nothing)
            print(response)
            break
        nothing = result[0]
busynothing = 83051
that's it.

没了?

我们是不是漏掉了啥?想想这题的主题是Cookies

response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=83051')
print(response.cookies)
response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=12345')
print(response.cookies)
response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=44827')
print(response.cookies)
<RequestsCookieJar[<Cookie info=%90 for .pythonchallenge.com/>]>
<RequestsCookieJar[<Cookie info=B for .pythonchallenge.com/>]>
<RequestsCookieJar[<Cookie info=Z for .pythonchallenge.com/>]>

看来每次请求都会返回一个带有info字段内容的Cookies,我们收集起来:

import re
import requests

url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing='
nothing = '12345'
cookies = []
with requests.Session() as sess:
    for i in range(400):
        response = sess.get(url + nothing)
        cookies.append(response.cookies['info'])
        result = re.findall(r'next busynothing is (\d+)', response.text)
        if not result:
            print('busynothing =', nothing)
            print(response.text)
            break
        nothing = result[0]
print(cookies)
busynothing = 83051
that's it.
['B', 'Z', 'h', '9', '1', 'A', 'Y', '%26', 'S', 'Y', '%94', '%3A', '%E2', 'I', '%00', '%00', '%21', '%19', '%80', 'P', '%81', '%11', '%00', '%AF', 'g', '%9E', '%A0', '+', '%00', 'h', 'E', '%3D', 'M', '%B5', '%23', '%D0', '%D4', '%D1', '%E2', '%8D', '%06', '%A9', '%FA', '%26', 'S', '%D4', '%D3', '%21', '%A1', '%EA', 'i', '7', 'h', '%9B', '%9A', '%2B', '%BF', '%60', '%22', '%C5', 'W', 'X', '%E1', '%AD', 'L', '%80', '%E8', 'V', '%3C', '%C6', '%A8', '%DB', 'H', '%26', '3', '2', '%18', '%A8', 'x', '%01', '%08', '%21', '%8D', 'S', '%0B', '%C8', '%AF', '%96', 'K', 'O', '%CA', '2', '%B0', '%F1', '%BD', '%1D', 'u', '%A0', '%86', '%05', '%92', 's', '%B0', '%92', '%C4', 'B', 'c', '%F1', 'w', '%24', 'S', '%85', '%09', '%09', 'C', '%AE', '%24', '%90']

一看这Cookies的内容就又有了似曾相识的感觉,让我们回想到了第8题integrity.htmlbzip2压缩编码。
考虑到%开头的是html的转义字符,而+应该要替换成空格:

from bz2 import decompress

data = requests.compat.unquote_plus(''.join(cookies), encoding='latin1').encode('latin1')
print(data)
print(decompress(data).decode())
b'BZh91AY&SY\x94:\xe2I\x00\x00!\x19\x80P\x81\x11\x00\xafg\x9e\xa0 \x00hE=M\xb5#\xd0\xd4\xd1\xe2\x8d\x06\xa9\xfa&S\xd4\xd3!\xa1\xeai7h\x9b\x9a+\xbf`"\xc5WX\xe1\xadL\x80\xe8V<\xc6\xa8\xdbH&32\x18\xa8x\x01\x08!\x8dS\x0b\xc8\xaf\x96KO\xca2\xb0\xf1\xbd\x1du\xa0\x86\x05\x92s\xb0\x92\xc4Bc\xf1w$S\x85\t\tC\xae$\x90'
is it the 26th already? call his father and inform him that "the flowers are on their way". he'll understand.

好了,提到了26号和flowers,又让我们想起了第15题uzi.html那个日历。
call his father,也就是打电话给Mozart的父亲,也就让我们想起了第13题disproportional.html那个电话phonebook.php,是一个XML-RPC
Mozart的父亲叫Leopold

from xmlrpc.client import ServerProxy

server = ServerProxy('http://www.pythonchallenge.com/pc/phonebook.php')
print(server.phone('Leopold'))
555-VIOLIN

我们把地址转到violin.html,提示:

no! i mean yes! but ../stuff/violin.php.

于是我们把地址再转到violin.php

  • leopold.jpg
  • 网页标题是it's me. what do you want?,内容为空,源码也没有隐藏信息

结合这个标题和其php性质,应该是要附带一些参数信息来请求。
再回看刚才的提示call his father and inform him that "the flowers are on their way"和这题的主题Cookies,我们猜想是设置Cookies来进行请求:

cookie = requests.cookies.cookiejar_from_dict({'info': 'the flowers are on their way'})
response = requests.get('http://www.pythonchallenge.com/pc/stuff/violin.php', cookies=cookie).text
print(response)
<html>
<head>
  <title>it's me. what do you want?</title>
  <link rel="stylesheet" type="text/css" href="../style.css">
</head>
<body>
	<br><br>
	<center><font color="gold">
	<img src="leopold.jpg" border="0"/>
<br><br>
oh well, don't you dare to forget the balloons.</font>
</body>
</html>

别忘了balloons,我们把地址改为balloons.html,终于来到了下一题!Hoooo!

总结:这一题相当的麻烦,需要利用到好几个前面题目的信息和方法,实际上尝试花去的时间要多不少。学习了requests库里面Cookies的相关操作,也学习到了用requests.Session()来提高多次请求的效率。

本题代码地址17_romance.ipynb