JohnLyu的blog

橙汁事务所艾欧泽亚分部

0%

自己实现wireshark实验

说实话, 没能完全理解老师的要求. 自己从头实现一个类似wireshark的工具工程量实在是有点过于庞大了.
如果从dumpcap层面实现倒还好, 但是也很麻烦. 因此本文专注于从链路层一步步解析到传输层的过程.

代码偏重演示性和可读性, 不代表实际使用和项目中应采用的结构.

pyshark不是wireshark, 所以不算直接使用wireshark

pyshark原理简介

pyshark实际上是调用了系统中的dumpcap将指定network interface的packet流输出到了指定的管道中.
接下来调用tshark从管道中读取二进制的数据流, 并解析成capture结构.

本实验致力于实现其中tshark的部分, 仅从pyshark的接口拿到二进制的packet包, 然后自己完成解析工作.

实验过程及代码

注意, 我只实现了IPv6部分的网络层, 请同学们自己实现IPv4的网络层

实现了Ethernet_II parser, IPv6 parser, IPv4 parser, TCP parser, 以及最后的包分析工具.

cap_lo0.pyview raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
# /*
# * @Author: John Lyu
# * @Date: 2020-10-10 19:44:59
# * @Last Modified by: John Lyu
# * @Last Modified time: 2020-10-10 19:44:59
# */
"""Create dynamic network layers of
Application Layer
Transport Layer
Network Layer
Data Link Layer
Physical Layer (No Need to implement)
"""

import ipaddress


class LayerParser(object):
"""
Base layer class
"""

def __init__(self, data):
"""
docstring
"""
self.data = data
self.header_struct = self.get_header_struct()
self.sliced_data = {}
self.analysis()

def get_header_struct(self):
# struck lenght is bits
# this will keep init order only with python version > 3.5
return {}

def pretty_print(self):
for part, data in self.sliced_data.items():
print(f"{part}: {hex(int(data, 2))}")

def analysis(self):
current_pos = 0
for part, length in self.header_struct.items():
end_pos = current_pos + length
self.sliced_data[part] = self.data[current_pos:end_pos]
current_pos = end_pos

def to_upper_layer(self):
pass


class EthernetParser(LayerParser):
"""
Ethernet_II Layer
refer to https://en.wikipedia.org/wiki/Ethernet_frame
in this layer, MAC address is used as id
"""

def get_header_struct(self):
return {
"DMAC": 6 * 8,
"SMAC": 6 * 8,
"Type": 2 * 8,
# there may be extended header here, but not implemented today
}

def next_protocal(self):
type_id = int(self.sliced_data["Type"], 2)
if type_id == 0x0800:
return "ipv4"
elif type_id == 0x0806:
return "arp"
elif type_id == 0x86DD:
return "ipv6"

def to_upper_layer(self):
return self.data[14 * 8:]


class LoopbackParser(LayerParser):
"""
Virtual Ethernet Layer, implemented for localhost
the first 4 bytes are
0000 1e 00 00 00
"""
HINT = "00011110" + "0" * (3 * 8)

def to_upper_layer(self):
return self.data[4 * 8:]

def next_protocal(self):
return "ipv6"


class IPv6Parser(LayerParser):
"""
NetworkParser, deal with IPv6
refer to https://en.wikipedia.org/wiki/IPv6_packet
"""

def get_header_struct(self):
return {
"protocol_version": 1 * 4,
"traffic_class": 2 * 4,
"flow_label": 5 * 4,
"payload_length": 4 * 4,
"next_protocol": 2 * 4, # 0x06 is TCP and 0x11 is UDP
"hop_limit": 2 * 4,
"source_address": 32 * 4,
"dst_address": 32 * 4,
# there may be extended header here, but not implemented today
}

def to_upper_layer(self):
return self.data[40 * 8:]

@property
def s_ip(self):
return ipaddress.IPv6Address(
int(self.sliced_data["source_address"], 2))

@property
def d_ip(self):
return ipaddress.IPv6Address(
int(self.sliced_data["dst_address"], 2))


class IPv4Parser(LayerParser):
"""
NetworkParser, deal with IPv4
refer to https://en.wikipedia.org/wiki/IPv4#Header
"""

def get_header_struct(self):
return {
"protocol_version": 1 * 4,
"IHL": 1 * 4,
"DSCP": 6,
"ECN": 2,
"Total_Length": 2 * 8,
"Identification": 2 * 8,
"Flags": 3,
"Fragment Offset": 2 * 8 - 3,
"Time To Live": 8,
"next_protocol": 8,
"Header_Checksum": 2 * 8,
"source_address": 4 * 8,
"dst_address": 4 * 8,
# there may be extended header here, but not implemented today
}

def analysis(self):
super().analysis()
self.header_length = int(self.sliced_data["IHL"], 2) * 32

def to_upper_layer(self):
return self.data[self.header_length:]

@property
def s_ip(self):
return ipaddress.IPv4Address(
int(self.sliced_data["source_address"], 2))

@property
def d_ip(self):
return ipaddress.IPv4Address(
int(self.sliced_data["dst_address"], 2))


class TCPParser(LayerParser):
"""
TCP layer data Parser
refer to https://en.wikipedia.org/wiki/Transmission_Control_Protocol
"""
# todo get full tcp request instead of one packet
def get_header_struct(self):
return {
"source_port": 8 * 2,
"dst_port": 8 * 2,
"sequence_number": 8 * 4,
"ack_number": 8 * 4,
"data_offset": 4,
"reserved": 3,
"ns": 1,
"cwr": 1,
"ece": 1,
"urg": 1,
"ack": 1,
"psh": 1,
"rst": 1,
"syn": 1,
"fin": 1,
"window_size": 8 * 2,
"checksum": 8 * 2,
"urgent_pointer": 8 * 2
}

def analysis(self):
super().analysis()
self.header_length = int(self.sliced_data["data_offset"], 2) * 32

def to_upper_layer(self):
return self.data[header_length:]


class MyShark(object):
"""
Oh My WireShark!
"""

def __init__(self, packets):
self.packets = packets

def get_stream(self, packet):
stream = [packet]
s_ip = packet.s_ip
d_ip = packet.d_ip
s_port = packet.s_port
d_port = packet.d_port
# todo detect FIN
for p in self.packets:
if all([
s_ip == p.s_ip,
d_ip == p.d_ip,
s_port == p.s_port,
d_port == p.d_port,
]):
stream.append(p)
return stream

def summary(self):
"""pretty print summarize info"""
# could use tree here for better performance
tag_dict = {}
for p in shark.packets:
src = str(p.s_ip) + ':' + str(p.s_port)
dst = str(p.d_ip) + ':' + str(p.d_port)
tag = f"source: {src:<23} dst: {dst:<23} protocol: {p.protocol}"
if tag in tag_dict.keys():
tag_dict[tag] += 1
else:
tag_dict[tag] = 1
for t, c in tag_dict.items():
print(t, " Count: ", c)


class MyPacket(object):
"""
Store all layer info in this class
"""

def __init__(self, pyshark_packet):
"""
init packet with bin string
"""
self.packet = pyshark_packet
rb = self.packet.get_raw_packet()
raw_hex = rb.hex()
raw_bin = bin(int(raw_hex, base=16))[2:].zfill(len(rb) * 8)

# I am not sure how to identity ethernet layer protocol
if raw_bin.startswith(LoopbackParser.HINT):
self.ethernet_layer = LoopbackParser(raw_bin)
else:
self.ethernet_layer = EthernetParser(raw_bin)

# switch ip layer version
if self.ethernet_layer.next_protocal() == "ipv6":
data = self.ethernet_layer.to_upper_layer()
self.ip_layer = IPv6Parser(data)
elif self.ethernet_layer.next_protocal() == "ipv4":
data = self.ethernet_layer.to_upper_layer()
self.ip_layer = IPv4Parser(data)
else:
raise ValueError("protrocal {} is not implemented".format(
self.ethernet_layer.next_protocal))

tcp_data = self.ip_layer.to_upper_layer()
self.tcp_layer = TCPParser(tcp_data)

@property
def s_ip(self):
return self.ip_layer.s_ip

@property
def d_ip(self):
return self.ip_layer.d_ip

@property
def s_port(self):
return int(self.tcp_layer.sliced_data["source_port"], 2)

@property
def d_port(self):
return int(self.tcp_layer.sliced_data["dst_port"], 2)

@property
def protocol(self):
code = self.ip_layer.sliced_data["next_protocol"]
code = int(code, 2)
if code == 0x06:
return "TCP"
elif code == 0x11:
return "UDP"
else:
return "Unknown"


if __name__ == "__main__":
import pyshark
# from bitstring import BitArray
from pathlib import Path
import os

fp = str(Path.home()) + '/Downloads/lo0_1.pcapng.gz'
if os.path.exists(fp):
capture = pyshark.FileCapture(fp,
use_json=True,
include_raw=True)
else:
# use this code to capture packet from lo0
capture = pyshark.LiveCapture(interface='lo0', output_file=fp)
capture.set_debug()
capture.sniff(timeout=10)

packets = []
for p in capture:
try:
packets.append(MyPacket(p))
except ValueError as identifier:
pass
shark = MyShark(packets)
shark.summary()

实验结果

ipv6抓包结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Python 3.8.5 (default, Sep 15 2020, 15:27:46) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.18.1 -- An enhanced Interactive Python. Type '?' for help.
source: ::1:53415 dst: ::1:4000 protocol: TCP Count: 13
source: ::1:4000 dst: ::1:53415 protocol: TCP Count: 16
source: ::1:53416 dst: ::1:4000 protocol: TCP Count: 9
source: ::1:4000 dst: ::1:53416 protocol: TCP Count: 9
source: ::1:53418 dst: ::1:4000 protocol: TCP Count: 12
source: ::1:4000 dst: ::1:53418 protocol: TCP Count: 12
source: ::1:53419 dst: ::1:4000 protocol: TCP Count: 8
source: ::1:4000 dst: ::1:53419 protocol: TCP Count: 8
source: ::1:53420 dst: ::1:4000 protocol: TCP Count: 9
source: ::1:4000 dst: ::1:53420 protocol: TCP Count: 9
source: ::1:53421 dst: ::1:4000 protocol: TCP Count: 11
source: ::1:4000 dst: ::1:53421 protocol: TCP Count: 15

ipv4抓包结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
source: 192.168.199.163:59324   dst: 202.160.128.238:443     protocol: TCP     Count:  1
source: 192.168.199.163:59325 dst: 202.160.128.238:443 protocol: TCP Count: 1
source: 192.168.199.163:59335 dst: 202.160.128.238:443 protocol: TCP Count: 1
source: 192.168.199.163:59318 dst: 176.122.148.37:443 protocol: TCP Count: 1
source: 192.168.199.163:49160 dst: 203.208.43.97:443 protocol: UDP Count: 1
source: 203.208.43.97:443 dst: 192.168.199.163:49160 protocol: UDP Count: 1
source: 192.168.199.163:59165 dst: 101.200.85.151:23446 protocol: TCP Count: 1
source: 192.168.199.163:59337 dst: 202.108.23.152:443 protocol: TCP Count: 1
source: 192.168.199.163:59352 dst: 202.38.64.11:80 protocol: TCP Count: 6
source: 192.168.199.163:59353 dst: 202.38.64.11:80 protocol: TCP Count: 2
source: 202.38.64.11:80 dst: 192.168.199.163:59352 protocol: TCP Count: 5
source: 202.38.64.11:80 dst: 192.168.199.163:59353 protocol: TCP Count: 1
source: 192.168.199.163:59001 dst: 113.96.12.217:8080 protocol: TCP Count: 3
source: 192.168.199.163:59083 dst: 81.70.95.109:22 protocol: TCP Count: 2
source: 192.168.199.163:59075 dst: 52.139.250.253:443 protocol: TCP Count: 3
source: 81.70.95.109:22 dst: 192.168.199.163:59083 protocol: TCP Count: 1
source: 113.96.12.217:8080 dst: 192.168.199.163:59001 protocol: TCP Count: 2
source: 101.200.85.151:23446 dst: 192.168.199.163:59272 protocol: TCP Count: 5
source: 192.168.199.163:59272 dst: 101.200.85.151:23446 protocol: TCP Count: 5
source: 101.200.85.151:23446 dst: 192.168.199.163:59276 protocol: TCP Count: 3
source: 192.168.199.163:59276 dst: 101.200.85.151:23446 protocol: TCP Count: 3
source: 192.168.199.163:59326 dst: 216.58.200.46:443 protocol: TCP Count: 1
source: 52.139.250.253:443 dst: 192.168.199.163:59075 protocol: TCP Count: 1
source: 192.168.199.163:59327 dst: 216.58.200.46:443 protocol: TCP Count: 1