Writing Web Clients

Web Clients -- The Tutorial


Anya (Family, season 5) -- Thank you for coming. We value your patronage.

What Are Web Clients?


Giles (Family, season 5) -- Could we please be a little less effusive, Anya?

What Are Web Clients Useful For?


Harmony (Family, season 5) -- Aww. You're my little lamb.

Review of Modules


Buffy (Family, season 5) -- Your definition of narrow is impressively wide.

Modules -- htmllib


Xander (Family, season 5) -- The answer is somewhere here.

Modules -- htmllib -- idiomatic usage

# For lists
import htmllib, formatter

h = htmllib.HTMLParser(formatter.NullFormatter())
h.feed(htmlString)
print h.anchorlist

Xander (Family, season 5) -- I'm helping, I'm reading, I'm quiet.

Modules -- htmllib -- idiotmatic usage (cont'd)

import htmllib, formatter

class IMGFinder(htmllib.HTMLParser):

    def __init__(self, *args, **kw):
        htmllib.HTMLParser.__init__(self, *args, **kw)
        self.ims = []

    def handle_image(self, src, *args): self.ims.append(src)

h = IMGFinder(formatter.NullFormatter())
h.feed(htmlString)
print h.ims

Donny (Family, season 5) -- Look what I found!

Modules -- htmllib -- base


Dawn (Family, season 5) -- This is the source of my gladness.

Modules -- htmllib -- base (example)


Riley (Family, season 5) -- Every time I think I'm getting close to you...

Modules -- urllib/urllib2


Glory (Family, season 5) -- I am great and I am beautiful.

Modules -- urllib/urllib2 (cont'd)


Joyce (Ted, season 2) -- He redid my entire system.

Modules -- urllib/urllib2 (examples)


Xander (Ted, season 2) -- Yum-my!

Digression -- HTTP Overview


Tara (Family, season 5) -- ...in terms of the karmic cycle.

Example HTTP Sessions

GET /foo/bar.html HTTP/1.0
Host: www.example.org
<blank line>
HTTP/1.0 200 OK
Content-Type: text/html

<html><body>lalalala</body></html>

Giles (Family, season 5) -- And you are talking about what on earth?

Modules -- httplib


Mr. MacLay (Family, season 5) -- We know how to control her...problem.

Modules -- httplib -- example

>>> import httplib
>>> h=httplib.HTTP("moshez.org")
>>> h.putrequest('GET', '/')
>>> h.putheader('Host', 'moshez.org')
>>> h.endheaders()
>>> h.getreply()
(200, 'OK', <mimetools.Message instance at 0x81220dc>)
>>> h.getfile().read(10)
"<HTML>\n<HE"

Anya (Family, season 5) -- ...and it was fun!

Modules -- urlparse


Buffy (Family, season 5) -- You know what, you guys, just leave it here.

Downloading Dilbert

import urllib2, re

URL = 'http://www.dilbert.com/'
f = urllib2.urlopen(URL)
s = f.read()
href = re.compile('<a href="(/comics/.*?/dilbert.*?gif)">')
m = href.search(value)
f = urllib2.urlretrieve(urlparse.urljoin(URL, m.group(1)),
                        "dilbert.gif")

Tara (Family, season 5) -- That was funny if you [...] are a complete dork.

Downloading Dark Angel Transcripts

import urllib2, htmllib, formatter, posixpath
URL="http://www.darkangelfan.com/episode/"
LINK_RE = re.compile('/trans_[0-9]+\.shtml$')
s = urllib2.urlopen(URL).read()
h = htmllib.HTMLParser(formatter.NullFormatter())
h.feed(s)
links = [urlparse.urljoin(URL, link)
              for link in h.anchorlist if LINK_RE.search(link)]
### -- really download --
for link in links:
    urllib2.urlretrieve(link, posixpath.basename(link))

Intern (Family, season 5) -- Yeah. That makes like five this month.

Downloading Dark Angel Transcripts (select)

class Downloader:

    def __init__(self, fin, fout):
        self.fin, self.fout, self.fileno = fin, fout, fin.fileno

    def read(self):
        buf = self.fin.read(4096)
        if not buf:
            for f in [self.fout, self.fin]: f.close()
            return 1
        self.fout.write(buf)

Joyce (Ted, season 2) -- I've been looking for the right moment.

Downloading Dark Angel Transcripts (select, cont'd)

downloaders = [Downloader(urllib2.urlopen(link),
                 open(posixpath.basename(link), 'wb'))
                                      for link in links]
while downloaders:
    toRead = select.select(None, [downloaders], [], [])
    for downloader in toRead:
         if downloader.read():
             downloaders.remove(downloader)

Buffy (Family, season 5) -- Tara's damn birthday is just one too many things for me to worry about.

Downloading Dark Angel Transcripts (threads)

import threading

for link in links:
    Thread(target=urllib2.urlretrieve,
           args=(link,posixpath.basename(link)))

Buffy (Ted, season 2) -- Sounds like fun.

Digression - twisted.web.client


Buffy (Ted, season 2) -- You're supposed to use your powers for good!

Downloading Dark Angel Transcripts (web.client)

from twisted.web import client
from twisted.internet import import reactor, defer

defer.DeferredList(
[client.downloadPage(link, posixpath.basename(link))
            for link in links]).addBoth(lambda _: reactor.stop())
reactor.run()

Ted (Ted, season 2) -- You don't have to worry about anything.

HTTP Authentication


Buffy (Ted, season 2) -- Ummm... Who are these people?

HTTP Authentication - manually

user = 'moshez'
password = 's3kr1t'
import httplib
h=httplib.HTTP("localhost")
h.putrequest('GET', '/protected/stuff.html')
h.putheader('Authorization',
            base64.encodestring(user+":"+password).strip())
h.endheaders()
h.getreply()
print h.getfile().read()

Tara (Family, season 5) -- And, uh, these are my-my friends.

HTTP Authentication - urllib2


Xander (Ted, season 2) -- I am really jinxing the hell out of us.

Further Reading


Willow (Ted, season 2) -- 'Book-cracker Buffy', it's kind of her nickname.

Questions?

Buffy (Family, season 5) -- I let you come, now sit down and look studious.

Bonus Slides

Tara (Family, season 5) -- You always make me feel special.

Cookies


Ted (Ted, season 2) -- Who's up for dessert? I made chocolate-chip cookies!

urllib2 cookies


Joyce (Ted, season 2) -- Mm! Buffy, you've got to try one of these!

Logging Into Advogato


import urllib2

u = urllib2.urlopen("http://advogato.org/acct/loginsub.html",
                    urllib2.urlencode({'u': 'moshez',
                                       'pass': 'not my real pass'})
cookie = u.info()['set-cookie']
cookie = cookie[:cookie.find(';')]
r = Request('http://advogato.org/diary/post.html',
            urllib2.urlencode(
            {'entry': open('entry').read(), 'post': 'Post'}),
            {'Cookie': cookie})
urllib2.urlopen(r).read()

Anya (Family, season 5) -- I have a place in the world now.

On Being Nice - Robots


Willow (Ted, season 2) -- There were design features in that robot that pre-date...

Using robotparser

import robotparser
rp = robotparser.RobotFileParser()
rp.set_url('http://www.example.com/robots.txt')
rp.read()
if not rp.can_fetch('', 'http://www.example.com/'):
    sys.exit(1)

Buffy (Ted, season 2) -- Tell me you didn't keep any parts.

webchecker


Willow (Ted, season 2) -- What do you mean, check him out?

websucker


Buffy (Ted, season 2) -- Find out his secrets, hack into his life.