阅读bittorrent源码时遇到的疑惑

home_king

UID: 8684
帖子: 36
积分: 82
在线时间: 1 小时

1^# home_king 发表于 2006-02-04 20:50

阅读bittorrent源码时遇到的疑惑

大家好。最近在阅读bittorrent4.3.6的源码，有一些和列表相关的疑惑，初学python，有很多不懂，请各位高手指教。谢谢！

第一个疑惑：

以下是相关代码（ktable.py）：
...
class KTable(object):
__slots__ = ('node', 'buckets')
"""local routing table for a kademlia like distributed hash table"""
def __init__(self, node):
      # this is the root node, a.k.a. US!
      self.node = node
      self.buckets = [KBucket([], 0L, 2L**HASH_LENGTH)]
      self.insertNode(node)

def _bucketIndexForInt(self, num):
      """the index of the bucket that should hold int"""
      return bisect_left(self.buckets, num)

def bucketForInt(self, num):
      return self.buckets[self._bucketIndexForInt(num)]

def findNodes(self, id, invalid=True):
      """
         return K nodes in our own local table closest to the ID.
      """

      if isinstance(id, str):
         num = hash.intify(id)
      elif isinstance(id, Node):
         num = id.num
      elif isinstance(id, int) or isinstance(id, long):
         num = id
      else:
         raise TypeError, "findNodes requires an int, string, or Node"

      nodes = []
      i = self._bucketIndexForInt(num)

      # if this node is already in our table then return it
      try:
         node = self.buckets.getNodeWithInt(num)
      except ValueError:
         pass
      else:
         return [node]

      # don't have the node, get the K closest nodes
      nodes = nodes + self.buckets.l
      if not invalid:
         nodes = [a for a in nodes if not a.invalid]
      if len(nodes) < K:
         # need more nodes
         min = i - 1
         max = i + 1
         while len(nodes) < K and (min >= 0 or max < len(self.buckets)):
            #ASw: note that this requires K be even
            if min >= 0:
                  nodes = nodes + self.buckets[min].l
            if max < len(self.buckets):
                  nodes = nodes + self.buckets[max].l
            min = min - 1
            max = max + 1
            if not invalid:
                  nodes = [a for a in nodes if not a.invalid]

      nodes.sort(lambda a, b, num=num: cmp(num ^ a.num, num ^ b.num))
      return nodes[:K]
...
class KBucket(object):
__slots__ = ('min', 'max', 'lastAccessed', 'l', 'index', 'invalid')
def __init__(self, contents, min, max):
      self.l = contents
      self.index = {}
      self.invalid = {}
      self.min = min
      self.max = max
      self.lastAccessed = time()
...

红色部分就是我疑惑的地方。
_bucketIndexForInt调用的bisect_left函数，其第一个参数就是一个列表，第二个参数应该属于该列表的成员类型。
（bisect_left来自bisect模块，详情参阅http://www.python.org/dev/doc/devel/lib/module-bisect.html）

但是，从代码可以看到，传递给bisect_left的第一个参数是KBucket类型的列表buckets，而第二个参数num，属于长整数类型。
为什么两个类型不一致呢？bisect_left似乎永远不会返回预想的值。

第二个疑惑：

def insertNode(self, node, contacted=1, nocheck=False):
      """
      this insert the node, returning None if successful, returns the oldest node in the bucket if it's full
      the caller responsible for pinging the returned node and calling replaceStaleNode if it is found to be stale!!
      contacted means that yes, we contacted THEM and we know the node is reachable
      """
      if node.id == NULL_ID or node.id == self.node.id:
         return

      if contacted:
         node.updateLastSeen()

      # get the bucket for this node
      i = self._bucketIndexForInt(node.num)
      # check to see if node is in the bucket already
      if self.buckets.hasNode(node):
         it = self.buckets.l.index(node.num)
         xnode = self.buckets.l[it]
         if contacted:
            node.age = xnode.age
            self.buckets.seenNode(node)
         elif xnode.lastSeen != 0 and xnode.port == node.port and xnode.host == node.host:
            xnode.updateLastSeen()
         return

上面的代码中，红色部分的语句调用列表l的index方法来返回指定列表成员的索引下标值。
但是列表的index方法的参数的类型应该是列表的成员类型才对，而这里，node.num是一个长整数类型，列表的成员类型是node。这样一来，程序会抛出ValueError异常啊！怎么会这样呢？

limodou

UID: 17491
帖子: 110
积分: 252
在线时间: 16 小时

2^# limodou 发表于 2006-02-04 23:40

问题一：

因为KBucket的代码不全，但我猜想之所以可以可能是因为KBucket象一个整数list一样工作。

问题二：

从KBucket的构造函数可以知道self.i = content ，而content正好是一个list。如果这个list是整数list的话，自然可以处理node.num整数了。而且要注意，python中的list不一定需要类型一样。只要存在即可：

[Copy to clipboard] [ - ]

CODE:

>>> a=['b', 1]
>>> a.index('b')
0
>>> a.index(1)
1

home_king

UID: 8684
帖子: 36
积分: 82
在线时间: 1 小时

3^# home_king 发表于 2006-02-05 01:09

谢谢limodou。

但是，我从代码发现，buckets列表的成员类型都是KBucket对象，而KBucket.l列表的成员类型都是node，但是在使用bisect_left()（对前者）和index()（对后者）的时候，却使用了不同类型的参数。这是我疑惑的地方。limodou可以再详细解释一下吗？

以下是KTable和KBucket两个类的完整代码。

[Copy to clipboard] [ - ]

CODE:

class KTable(object):
__slots__ = ('node', 'buckets')
"""local routing table for a kademlia like distributed hash table"""
def __init__(self, node):
      # this is the root node, a.k.a. US!
      self.node = node
      self.buckets = [KBucket([], 0L, 2L**HASH_LENGTH)]
      self.insertNode(node)

def _bucketIndexForInt(self, num):
      """the index of the bucket that should hold int"""
      return bisect_left(self.buckets, num)

def bucketForInt(self, num):
      return self.buckets[self._bucketIndexForInt(num)]

def findNodes(self, id, invalid=True):
      """
         return K nodes in our own local table closest to the ID.
      """

      if isinstance(id, str):
         num = hash.intify(id)
      elif isinstance(id, Node):
         num = id.num
      elif isinstance(id, int) or isinstance(id, long):
         num = id
      else:
         raise TypeError, "findNodes requires an int, string, or Node"

      nodes = []
      i = self._bucketIndexForInt(num)

      # if this node is already in our table then return it
      try:
         node = self.buckets[i].getNodeWithInt(num)
      except ValueError:
         pass
      else:
         return [node]

      # don't have the node, get the K closest nodes
      nodes = nodes + self.buckets[i].l
      if not invalid:
         nodes = [a for a in nodes if not a.invalid]
      if len(nodes) < K:
         # need more nodes
         min = i - 1
         max = i + 1
         while len(nodes) < K and (min >= 0 or max < len(self.buckets)):
            #ASw: note that this requires K be even
            if min >= 0:
                  nodes = nodes + self.buckets[min].l
            if max < len(self.buckets):
                  nodes = nodes + self.buckets[max].l
            min = min - 1
            max = max + 1
            if not invalid:
                  nodes = [a for a in nodes if not a.invalid]

      nodes.sort(lambda a, b, num=num: cmp(num ^ a.num, num ^ b.num))
      return nodes[:K]

def _splitBucket(self, a):
      diff = (a.max - a.min) / 2
      b = KBucket([], a.max - diff, a.max)
      self.buckets.insert(self.buckets.index(a.min) + 1, b)
      a.max = a.max - diff
      # transfer nodes to new bucket
      for anode in a.l[:]:
         if anode.num >= a.max:
            a.removeNode(anode)
            b.addNode(anode)

def replaceStaleNode(self, stale, new):
      """this is used by clients to replace a node returned by insertNode after
      it fails to respond to a Pong message"""
      i = self._bucketIndexForInt(stale.num)

      if self.buckets[i].hasNode(stale):
         self.buckets[i].removeNode(stale)
      if new and self.buckets[i].hasNode(new):
         self.buckets[i].seenNode(new)
      elif new:
         self.buckets[i].addNode(new)

      return

def insertNode(self, node, contacted=1, nocheck=False):
      """
      this insert the node, returning None if successful, returns the oldest node in the bucket if it's full
      the caller responsible for pinging the returned node and calling replaceStaleNode if it is found to be stale!!
      contacted means that yes, we contacted THEM and we know the node is reachable
      """
      if node.id == NULL_ID or node.id == self.node.id:
         return

      if contacted:
         node.updateLastSeen()

      # get the bucket for this node
      i = self._bucketIndexForInt(node.num)
      # check to see if node is in the bucket already
      if self.buckets[i].hasNode(node):
         it = self.buckets[i].l.index(node.num)
         xnode = self.buckets[i].l[it]
         if contacted:
            node.age = xnode.age
            self.buckets[i].seenNode(node)
         elif xnode.lastSeen != 0 and xnode.port == node.port and xnode.host == node.host:
            xnode.updateLastSeen()
         return

      # we don't have this node, check to see if the bucket is full
      if not self.buckets[i].bucketFull():
         # no, append this node and return
         self.buckets[i].addNode(node)
         return

      # full bucket, check to see if any nodes are invalid
      t = time()
      invalid = [x for x in self.buckets[i].invalid.values() if x.invalid]
      if len(invalid) and not nocheck:
         invalid.sort(ls)
         while invalid and not self.buckets[i].hasNode(invalid[0]):
            del(self.buckets[i].invalid[invalid[0].num])
            invalid = invalid[1:]
         if invalid and (invalid[0].lastSeen == 0 and invalid[0].fails < MAX_FAILURES):
            return invalid[0]
         elif invalid:
            self.replaceStaleNode(invalid[0], node)
            return

      stale =  [n for n in self.buckets[i].l if (t - n.lastSeen) > MIN_PING_INTERVAL]
      if len(stale) and not nocheck:
         stale.sort(ls)
         return stale[0]

      # bucket is full and all nodes are valid, check to see if self.node is in the bucket
      if not (self.buckets[i].min <= self.node < self.buckets[i].max):
         return

      # this bucket is full and contains our node, split the bucket
      if len(self.buckets) >= HASH_LENGTH:
         # our table is FULL, this is really unlikely
         print "Hash Table is FULL!  Increase K!"
         return

      self._splitBucket(self.buckets[i])

      # now that the bucket is split and balanced, try to insert the node again
      return self.insertNode(node, contacted)

def justSeenNode(self, id):
      """call this any time you get a message from a node
      it will update it in the table if it's there """
      try:
         n = self.findNodes(id)[0]
      except IndexError:
         return None
      else:
         tstamp = n.lastSeen
         n.updateLastSeen()
         bucket = self.bucketForInt(n.num)
         bucket.seenNode(n)
         return tstamp

def invalidateNode(self, n):
      """
         forget about node n - use when you know that node is invalid
      """
      n.invalid = True
      bucket = self.bucketForInt(n.num)
      bucket.invalidateNode(n)

def nodeFailed(self, node):
      """ call this when a node fails to respond to a message, to invalidate that node """
      try:
         n = self.findNodes(node.num)[0]
      except IndexError:
         return None
      else:
         if n.msgFailed() >= const.MAX_FAILURES:
            self.invalidateNode(n)

def numPeers(self):
      """ estimated number of connectable nodes in global table """
      return 8 * (2 ** (len(self.buckets) - 1))

class KBucket(object):
__slots__ = ('min', 'max', 'lastAccessed', 'l', 'index', 'invalid')
def __init__(self, contents, min, max):
      self.l = contents
      self.index = {}
      self.invalid = {}
      self.min = min
      self.max = max
      self.lastAccessed = time()

def touch(self):
      self.lastAccessed = time()

def lacmp(self, a, b):
      if a.lastSeen > b.lastSeen:
         return 1
      elif b.lastSeen > a.lastSeen:
         return -1
      return 0

def sort(self):
      self.l.sort(self.lacmp)

def getNodeWithInt(self, num):
      try:
         node = self.index[num]
      except KeyError:
         raise ValueError
      return node

def addNode(self, node):
      if len(self.l) >= K:
         return
      if self.index.has_key(node.num):
         return
      self.l.append(node)
      self.index[node.num] = node
      self.touch()

def removeNode(self, node):
      assert self.index.has_key(node.num)
      del(self.l[self.l.index(node.num)])
      del(self.index[node.num])
      try:
         del(self.invalid[node.num])
      except KeyError:
         pass
      self.touch()

def invalidateNode(self, node):
      self.invalid[node.num] = node

def seenNode(self, node):
      try:
         del(self.invalid[node.num])
      except KeyError:
         pass
      it = self.l.index(node.num)
      del(self.l[it])
      self.l.append(node)
      self.index[node.num] = node

def hasNode(self, node):
      return self.index.has_key(node.num)

def bucketFull(self):
      return len(self.l) >= K

def __repr__(self):
      return "<KBucket %d items (%d to %d)>" % (len(self.l), self.min, self.max)

## Comparators
# necessary for bisecting list of buckets with a hash expressed as an integer or a distance
# compares integer or node object with the bucket's range
def __lt__(self, a):
      if isinstance(a, Node): a = a.num
      return self.max <= a
def __le__(self, a):
      if isinstance(a, Node): a = a.num
      return self.min < a
def __gt__(self, a):
      if isinstance(a, Node): a = a.num
      return self.min > a
def __ge__(self, a):
      if isinstance(a, Node): a = a.num
      return self.max >= a
def __eq__(self, a):
      if isinstance(a, Node): a = a.num
      return self.min <= a and self.max > a
def __ne__(self, a):
      if isinstance(a, Node): a = a.num
      return self.min >= a or self.max < a

limodou

UID: 17491
帖子: 110
积分: 252
在线时间: 16 小时

4^# limodou 发表于 2006-02-05 01:32

这个我也只是猜测。要具体看bisect_left的代码是如何执行的：

[Copy to clipboard] [ - ]

CODE:

def bisect_left(a, x, lo=0, hi=None):
"""Return the index where to insert item x in list a, assuming a is sorted.

The return value i is such that all e in a[:i] have e < x, and all e in
a[i:] have e >= x.  So if x already appears in the list, i points just
before the leftmost x already there.

Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""

if hi is None:
      hi = len(a)
while lo < hi:
      mid = (lo+hi)//2
      if a[mid] < x: lo = mid+1
      else: hi = mid
return lo

上面是它的代码。可以看出对于list a来说，它只用到了比较，即 a[mid] < x 。而你可以注意KBucket类有如下的定义：

[Copy to clipboard] [ - ]

CODE:

def __lt__(self, a):
      if isinstance(a, Node): a = a.num
      return self.max <= a
def __le__(self, a):
      if isinstance(a, Node): a = a.num
      return self.min < a
def __gt__(self, a):
      if isinstance(a, Node): a = a.num
      return self.min > a
def __ge__(self, a):
      if isinstance(a, Node): a = a.num
      return self.max >= a
def __eq__(self, a):
      if isinstance(a, Node): a = a.num
      return self.min <= a and self.max > a
def __ne__(self, a):
      if isinstance(a, Node): a = a.num
      return self.min >= a or self.max < a

而这些就正好可以摸拟上面的操作。使得KBucket与一个整数list没有什么区别。

这就正好使用了python的特殊类方法。因此你完合可以把KBucket认为就是一个整数list。而代码中也不是直接使用node，要么通过node找到对应的整数值，要么使用node.num。好象是这样，没有太细看。

home_king

UID: 8684
帖子: 36
积分: 82
在线时间: 1 小时

5^# home_king 发表于 2006-02-05 10:52

嗯。终于明白了。我真是疏忽大意。
初学python，以后还有很多问题要请教各位高手。谢谢！