#6 | pythonchallenge

继续挑战！从这题就开始考验发散性思维了，逐渐展现出这个挑战的恐怖。。。

第6题地址channel.html

网页标题是now there are pairs，题目内容为空

惯例打开源码，除了最后面说明了是无用信息之外，只有一行隐藏信息：

<!– <-- zip –>

zip正好也是图片中拉链的意思，说明这是个重要提示。而图片文件名叫channel.jpg，我们是不是可以试试channel.zip：

from io import BytesIO
from zipfile import ZipFile
import requests
channel = requests.get('http://www.pythonchallenge.com/pc/def/channel.zip').content
with ZipFile(BytesIO(channel), 'r') as f:
    file_list = f.filelist
    with f.open('readme.txt', 'r') as f_readme:
        read_me = f_readme.read()
print(file_list[-5:])
print(read_me.decode())

[<ZipInfo filename='99460.txt' compress_type=deflate file_size=21 compress_size=23>, <ZipInfo filename='99714.txt' compress_type=deflate file_size=21 compress_size=23>, <ZipInfo filename='99775.txt' compress_type=deflate file_size=21 compress_size=23>, <ZipInfo filename='99905.txt' compress_type=deflate file_size=21 compress_size=23>, <ZipInfo filename='readme.txt' compress_type=deflate filemode='-rw-r--r--' file_size=84 compress_size=78>]
welcome to my zipped list.

hint1: start from 90052
hint2: answer is inside the zip

可以直接下载回来研究，这里我为了演示就不生成文件了。在zip文件中我们可以看到是一堆'*****.txt'和一个'readme.txt'文件，每个数字.txt文件里面几乎写的都是Next nothing is *****，这里我们就联想到前面的某一题，只不过换成了文件的形式。
我们来试一试：

import re
nothing = '90052'
with ZipFile(BytesIO(channel), 'r') as f:
    while True:
        with f.open(nothing + '.txt', 'r') as f_read:
            content = f_read.read().decode()
        result = re.findall(r'next nothing is (\d+)', content, re.IGNORECASE)
        if result:
            nothing = result[0]
        else:
            print('nothing =', nothing)
            print(content)
            break

nothing = 46145
Collect the comments.

comments，注释？再次打开zip文件一看，注释是空的啊。

这里就要研究一下zip文件的格式结构了。
首先是若干个字节的文件头，记录整个压缩文件的压缩方式、大小等信息
然后是每个压缩文件的压缩数据
然后是依次记录每个压缩文件的压缩方式、大小等信息的目录结构信息
最后是文件结束标志信息

其实我们关注的是comment注释，它会出现在什么地方呢？首先会出现在文件头中，也就是整个压缩文件的注释信息，就是我们打开文件时看到的空的那个。还会出现在记录每个压缩文件信息的目录结构信息中，在每段信息的最后依次是文件名+扩展信息+注释信息（具体可以用记事本打开一个zip文件研究一下）。

'46145.txt'中说到'Collect the comments.'，其实就是想让我们依次收集每一个文件的注释，幸好我们有ZipInfo类（ZipFile类的构造方法中已经读取读取到了filelist列表和NameToInfo字典中，也提供了getinfo(name)方法供我们按文件名提取出来），我们修改一下代码：

nothing = '90052'
comments = b''
with ZipFile(BytesIO(channel), 'r') as f:
    while True:
        comments += f.getinfo(nothing + '.txt').comment
        with f.open(nothing + '.txt', 'r') as f_read:
            content = f_read.read().decode()
        result = re.findall(r'next nothing is (\d+)', content, re.IGNORECASE)
        if result:
            nothing = result[0]
        else:
            print('nothing =', nothing)
            print(content)
            print(comments.decode())
            break

nothing = 46145
Collect the comments.
****************************************************************
****************************************************************
**                                                            **
**   OO    OO    XX      YYYY    GG    GG  EEEEEE NN      NN  **
**   OO    OO  XXXXXX   YYYYYY   GG   GG   EEEEEE  NN    NN   **
**   OO    OO XXX  XXX YYY   YY  GG GG     EE       NN  NN    **
**   OOOOOOOO XX    XX YY        GGG       EEEEE     NNNN     **
**   OOOOOOOO XX    XX YY        GGG       EEEEE      NN      **
**   OO    OO XXX  XXX YYY   YY  GG GG     EE         NN      **
**   OO    OO  XXXXXX   YYYYYY   GG   GG   EEEEEE     NN      **
**   OO    OO    XX      YYYY    GG    GG  EEEEEE     NN      **
**                                                            **
****************************************************************
 **************************************************************

修改地址为hockey.html，打开后提示

it’s in the air. look at the letters.

我们发现图画的每个字母都是由同一个字母组成的，提示让我们关注字母，结合前半句，很容易得到结果oxygen.html，打开后进入下一题！

#6

Categories

Tags

第6题地址channel.html

总结：这题做起来还是比较麻烦的，毕竟要有不少知识储备。

本题代码地址6_channel.ipynb