#17

继续挑战，难度开始上升了

第17题地址romance.html

网页标题是eat?，题目内容为空，源码也没有隐藏信息

没什么说的，肯定是从读图开始。
图片上是一些小饼干，看看图片名字叫——cookies。嗯？cookies，Cookies！

Cookie（复数形态Cookies），又称为“小甜饼”。类型为“小型文本文件”，指某些网站为了辨别用户身份而储存在用户本地终端（Client Side）上的数据（通常经过加密）。由网景公司的前雇员卢·蒙特利在1993年3月发明。最初定义于RFC 2109。当前使用最广泛的 Cookie标准却不是RFC中定义的任何一个，而是在网景公司制定的标准上进行扩展后的产物。

From wikipedia.org

题目的cookies是不是这个意思呢？我们来看一看：

import requests

with requests.Session() as sess:
    sess.auth = ('huge', 'file')
    response = sess.get('http://www.pythonchallenge.com/pc/return/romance.html')
    print(response.cookies)

<RequestsCookieJar[]>

服务器并没有给我们返回Cookies啊！

我们继续看图，发现左下角还有一个小图，看上去有似曾相识的感觉。
翻了翻前面的题目发现，这是第4题linkedlist.php的图片缩小版。
难道这里面藏了东西？我们来看一看：

response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php')
print(response.cookies)

<RequestsCookieJar[<Cookie info=you+should+have+followed+busynothing... for .pythonchallenge.com/>]>

果然藏着Cookies！
服务器返回的信息是（+在html里是空格）：

you should have followed busynothing…

这是什么意思呢？
让我们来回顾一下第4题，是使用GET方法将nothing设置成一定值来向服务器请求下一个nothing及相关信息的。
那这句话的意思是不是要我们换成busynothing来请求？

response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=12345')
print(response.text)

If you came here from level 4 - go back!<br>You should follow the obvious chain...<br><br>and the next busynothing is 44827

response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=44827')
print(response.text)

and the next busynothing is 45439

看来思路没错，我们按第4题思路重来一遍：

import re
import requests

url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing='
nothing = '12345'
with requests.Session() as sess:
    for i in range(400):
        response = sess.get(url + nothing).text
        result = re.findall(r'next busynothing is (\d+)', response)
        if not result:
            print('busynothing =', nothing)
            print(response)
            break
        nothing = result[0]

busynothing = 83051
that's it.

没了？

我们是不是漏掉了啥？想想这题的主题是Cookies：

response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=83051')
print(response.cookies)
response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=12345')
print(response.cookies)
response = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing=44827')
print(response.cookies)

<RequestsCookieJar[<Cookie info=%90 for .pythonchallenge.com/>]>
<RequestsCookieJar[<Cookie info=B for .pythonchallenge.com/>]>
<RequestsCookieJar[<Cookie info=Z for .pythonchallenge.com/>]>

看来每次请求都会返回一个带有info字段内容的Cookies，我们收集起来：

import re
import requests

url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?busynothing='
nothing = '12345'
cookies = []
with requests.Session() as sess:
    for i in range(400):
        response = sess.get(url + nothing)
        cookies.append(response.cookies['info'])
        result = re.findall(r'next busynothing is (\d+)', response.text)
        if not result:
            print('busynothing =', nothing)
            print(response.text)
            break
        nothing = result[0]
print(cookies)

busynothing = 83051
that's it.
['B', 'Z', 'h', '9', '1', 'A', 'Y', '%26', 'S', 'Y', '%94', '%3A', '%E2', 'I', '%00', '%00', '%21', '%19', '%80', 'P', '%81', '%11', '%00', '%AF', 'g', '%9E', '%A0', '+', '%00', 'h', 'E', '%3D', 'M', '%B5', '%23', '%D0', '%D4', '%D1', '%E2', '%8D', '%06', '%A9', '%FA', '%26', 'S', '%D4', '%D3', '%21', '%A1', '%EA', 'i', '7', 'h', '%9B', '%9A', '%2B', '%BF', '%60', '%22', '%C5', 'W', 'X', '%E1', '%AD', 'L', '%80', '%E8', 'V', '%3C', '%C6', '%A8', '%DB', 'H', '%26', '3', '2', '%18', '%A8', 'x', '%01', '%08', '%21', '%8D', 'S', '%0B', '%C8', '%AF', '%96', 'K', 'O', '%CA', '2', '%B0', '%F1', '%BD', '%1D', 'u', '%A0', '%86', '%05', '%92', 's', '%B0', '%92', '%C4', 'B', 'c', '%F1', 'w', '%24', 'S', '%85', '%09', '%09', 'C', '%AE', '%24', '%90']

一看这Cookies的内容就又有了似曾相识的感觉，让我们回想到了第8题integrity.html的bzip2压缩编码。
考虑到%开头的是html的转义字符，而+应该要替换成空格：

from bz2 import decompress

data = requests.compat.unquote_plus(''.join(cookies), encoding='latin1').encode('latin1')
print(data)
print(decompress(data).decode())

b'BZh91AY&SY\x94:\xe2I\x00\x00!\x19\x80P\x81\x11\x00\xafg\x9e\xa0 \x00hE=M\xb5#\xd0\xd4\xd1\xe2\x8d\x06\xa9\xfa&S\xd4\xd3!\xa1\xeai7h\x9b\x9a+\xbf`"\xc5WX\xe1\xadL\x80\xe8V<\xc6\xa8\xdbH&32\x18\xa8x\x01\x08!\x8dS\x0b\xc8\xaf\x96KO\xca2\xb0\xf1\xbd\x1du\xa0\x86\x05\x92s\xb0\x92\xc4Bc\xf1w$S\x85\t\tC\xae$\x90'
is it the 26th already? call his father and inform him that "the flowers are on their way". he'll understand.

好了，提到了26号和flowers，又让我们想起了第15题uzi.html那个日历。
call his father，也就是打电话给Mozart的父亲，也就让我们想起了第13题disproportional.html那个电话phonebook.php，是一个XML-RPC。
Mozart的父亲叫Leopold：

from xmlrpc.client import ServerProxy

server = ServerProxy('http://www.pythonchallenge.com/pc/phonebook.php')
print(server.phone('Leopold'))

555-VIOLIN

我们把地址转到violin.html，提示：

no! i mean yes! but ../stuff/violin.php.

于是我们把地址再转到violin.php：

网页标题是it's me. what do you want?，内容为空，源码也没有隐藏信息

结合这个标题和其php性质，应该是要附带一些参数信息来请求。
再回看刚才的提示call his father and inform him that "the flowers are on their way"和这题的主题Cookies，我们猜想是设置Cookies来进行请求：

cookie = requests.cookies.cookiejar_from_dict({'info': 'the flowers are on their way'})
response = requests.get('http://www.pythonchallenge.com/pc/stuff/violin.php', cookies=cookie).text
print(response)

<html>
<head>
  <title>it's me. what do you want?</title>
  <link rel="stylesheet" type="text/css" href="../style.css">
</head>
<body>
	<br><br>
	<center><font color="gold">
	<img src="leopold.jpg" border="0"/>
<br><br>
oh well, don't you dare to forget the balloons.</font>
</body>
</html>

别忘了balloons，我们把地址改为balloons.html，终于来到了下一题！Hoooo！

Categories

Tags

第17题地址romance.html

From wikipedia.org

没了？

总结：这一题相当的麻烦，需要利用到好几个前面题目的信息和方法，实际上尝试花去的时间要多不少。学习了`requests`库里面`Cookies`的相关操作，也学习到了用`requests.Session()`来提高多次请求的效率。

本题代码地址17_romance.ipynb

Categories

Tags

第17题地址romance.html

From wikipedia.org

没了？

总结：这一题相当的麻烦，需要利用到好几个前面题目的信息和方法，实际上尝试花去的时间要多不少。学习了requests库里面Cookies的相关操作，也学习到了用requests.Session()来提高多次请求的效率。

本题代码地址17_romance.ipynb

总结：这一题相当的麻烦，需要利用到好几个前面题目的信息和方法，实际上尝试花去的时间要多不少。学习了`requests`库里面`Cookies`的相关操作，也学习到了用`requests.Session()`来提高多次请求的效率。