继续挑战,难度开始上升了
第17题地址romance.html
- 网页标题是
eat?
,题目内容为空,源码也没有隐藏信息
没什么说的,肯定是从读图开始。
图片上是一些小饼干,看看图片名字叫——cookies
。嗯?cookies
,Cookies
!
Cookie(复数形态Cookies),又称为“小甜饼”。类型为“小型文本文件”,指某些网站为了辨别用户身份而储存在用户本地终端(Client Side)上的数据(通常经过加密)。由网景公司的前雇员卢·蒙特利在1993年3月发明。最初定义于RFC 2109。当前使用最广泛的 Cookie标准却不是RFC中定义的任何一个,而是在网景公司制定的标准上进行扩展后的产物。
From wikipedia.org
题目的cookies
是不是这个意思呢?我们来看一看:
import requests
with requests.Session() as sess:
sess.auth = ('huge', 'file')
response = sess.get('http://www.pythonchallenge.com/pc/return/romance.html')
print(response.cookies)
<RequestsCookieJar[]>
服务器并没有给我们返回Cookies
啊!
我们继续看图,发现左下角还有一个小图,看上去有似曾相识的感觉。
翻了翻前面的题目发现,这是第4题linkedlist.php的图片缩小版。
难道这里面藏了东西?我们来看一看:
response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php')
print(response.cookies)
<RequestsCookieJar[<Cookie info=you+should+have+followed+busynothing... for .pythonchallenge.com/>]>
果然藏着Cookies
!
服务器返回的信息是(+
在html里是空格):
you should have followed busynothing…
这是什么意思呢?
让我们来回顾一下第4题,是使用GET
方法将nothing
设置成一定值来向服务器请求下一个nothing
及相关信息的。
那这句话的意思是不是要我们换成busynothing
来请求?
response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=12345')
print(response.text)
If you came here from level 4 - go back!<br>You should follow the obvious chain...<br><br>and the next busynothing is 44827
response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=44827')
print(response.text)
and the next busynothing is 45439
看来思路没错,我们按第4题思路重来一遍:
import re
import requests
url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing='
nothing = '12345'
with requests.Session() as sess:
for i in range(400):
response = sess.get(url + nothing).text
result = re.findall(r'next busynothing is (\d+)', response)
if not result:
print('busynothing =', nothing)
print(response)
break
nothing = result[0]
busynothing = 83051
that's it.
没了?
我们是不是漏掉了啥?想想这题的主题是Cookies
:
response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=83051')
print(response.cookies)
response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=12345')
print(response.cookies)
response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=44827')
print(response.cookies)
<RequestsCookieJar[<Cookie info=%90 for .pythonchallenge.com/>]>
<RequestsCookieJar[<Cookie info=B for .pythonchallenge.com/>]>
<RequestsCookieJar[<Cookie info=Z for .pythonchallenge.com/>]>
看来每次请求都会返回一个带有info
字段内容的Cookies
,我们收集起来:
import re
import requests
url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing='
nothing = '12345'
cookies = []
with requests.Session() as sess:
for i in range(400):
response = sess.get(url + nothing)
cookies.append(response.cookies['info'])
result = re.findall(r'next busynothing is (\d+)', response.text)
if not result:
print('busynothing =', nothing)
print(response.text)
break
nothing = result[0]
print(cookies)
busynothing = 83051
that's it.
['B', 'Z', 'h', '9', '1', 'A', 'Y', '%26', 'S', 'Y', '%94', '%3A', '%E2', 'I', '%00', '%00', '%21', '%19', '%80', 'P', '%81', '%11', '%00', '%AF', 'g', '%9E', '%A0', '+', '%00', 'h', 'E', '%3D', 'M', '%B5', '%23', '%D0', '%D4', '%D1', '%E2', '%8D', '%06', '%A9', '%FA', '%26', 'S', '%D4', '%D3', '%21', '%A1', '%EA', 'i', '7', 'h', '%9B', '%9A', '%2B', '%BF', '%60', '%22', '%C5', 'W', 'X', '%E1', '%AD', 'L', '%80', '%E8', 'V', '%3C', '%C6', '%A8', '%DB', 'H', '%26', '3', '2', '%18', '%A8', 'x', '%01', '%08', '%21', '%8D', 'S', '%0B', '%C8', '%AF', '%96', 'K', 'O', '%CA', '2', '%B0', '%F1', '%BD', '%1D', 'u', '%A0', '%86', '%05', '%92', 's', '%B0', '%92', '%C4', 'B', 'c', '%F1', 'w', '%24', 'S', '%85', '%09', '%09', 'C', '%AE', '%24', '%90']
一看这Cookies
的内容就又有了似曾相识的感觉,让我们回想到了第8题integrity.html的bzip2
压缩编码。
考虑到%
开头的是html的转义字符,而+
应该要替换成空格:
from bz2 import decompress
data = requests.compat.unquote_plus(''.join(cookies), encoding='latin1').encode('latin1')
print(data)
print(decompress(data).decode())
b'BZh91AY&SY\x94:\xe2I\x00\x00!\x19\x80P\x81\x11\x00\xafg\x9e\xa0 \x00hE=M\xb5#\xd0\xd4\xd1\xe2\x8d\x06\xa9\xfa&S\xd4\xd3!\xa1\xeai7h\x9b\x9a+\xbf`"\xc5WX\xe1\xadL\x80\xe8V<\xc6\xa8\xdbH&32\x18\xa8x\x01\x08!\x8dS\x0b\xc8\xaf\x96KO\xca2\xb0\xf1\xbd\x1du\xa0\x86\x05\x92s\xb0\x92\xc4Bc\xf1w$S\x85\t\tC\xae$\x90'
is it the 26th already? call his father and inform him that "the flowers are on their way". he'll understand.
好了,提到了26号和flowers,又让我们想起了第15题uzi.html那个日历。
call his father
,也就是打电话给Mozart的父亲,也就让我们想起了第13题disproportional.html那个电话phonebook.php,是一个XML-RPC
。
Mozart的父亲叫Leopold:
from xmlrpc.client import ServerProxy
server = ServerProxy('http://www.pythonchallenge.com/pc/phonebook.php')
print(server.phone('Leopold'))
555-VIOLIN
我们把地址转到violin.html,提示:
no! i mean yes! but ../stuff/violin.php.
于是我们把地址再转到violin.php:
- 网页标题是
it's me. what do you want?
,内容为空,源码也没有隐藏信息
结合这个标题和其php
性质,应该是要附带一些参数信息来请求。
再回看刚才的提示call his father and inform him that "the flowers are on their way"
和这题的主题Cookies
,我们猜想是设置Cookies
来进行请求:
cookie = requests.cookies.cookiejar_from_dict({'info': 'the flowers are on their way'})
response = requests.get('http://www.pythonchallenge.com/pc/stuff/violin.php', cookies=cookie).text
print(response)
<html>
<head>
<title>it's me. what do you want?</title>
<link rel="stylesheet" type="text/css" href="../style.css">
</head>
<body>
<br><br>
<center><font color="gold">
<img src="leopold.jpg" border="0"/>
<br><br>
oh well, don't you dare to forget the balloons.</font>
</body>
</html>
别忘了balloons
,我们把地址改为balloons.html,终于来到了下一题!Hoooo!