We do not have DCB PFC so the traffic is managed by PAUSE frames by default.
The kernel has been recompiled to include your change. I ran the same
test after the kernel change, i.e. running 10 fio sessions from
initiator to 10 drives on target. The target machine hung after a few
minutes.
On the initiator side the messages look like:
4453 May 27 10:44:14 poc1 kernel: [ 1726.288288] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4454 May 27 10:44:14 poc1 kernel: [ 1726.292240] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4455 May 27 10:44:14 poc1 kernel: [ 1726.295246] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4456 May 27 10:44:14 poc1 kernel: [ 1726.296245] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4457 May 27 10:44:14 poc1 kernel: [ 1726.298249] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4458 May 27 10:44:14 poc1 kernel: [ 1726.300288] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4459 May 27 10:44:14 poc1 kernel: [ 1726.303268] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4460 May 27 10:44:14 poc1 kernel: [ 1726.303273] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4461 May 27 10:44:14 poc1 kernel: [ 1726.304297] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4462 May 27 10:44:14 poc1 kernel: [ 1726.306257] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4463 May 27 10:44:14 poc1 kernel: [ 1726.307293] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4464 May 27 10:44:14 poc1 kernel: [ 1726.307298] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4465 May 27 10:44:14 poc1 kernel: [ 1726.307301] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4466 May 27 10:44:14 poc1 kernel: [ 1726.310317] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4467 May 27 10:44:14 poc1 kernel: [ 1726.313257] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4468 May 27 10:44:14 poc1 kernel: [ 1726.315297] host7: fcp: 00061e:
xid 01cd-0a84: DDP I/O in fc_fcp_recv_data set ERROR
4469 May 27 10:44:14 poc1 kernel: [ 1726.315302] host7: xid 1cd:
f_ctl 90000 seq 1
4470 May 27 10:44:14 poc1 kernel: [ 1726.315666] host7: xid 1cd: BLS
rctl 85 - BLS reject received
4471 May 27 10:44:14 poc1 kernel: [ 1726.316295] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4472 May 27 10:44:14 poc1 kernel: [ 1726.317260] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4473 May 27 10:44:14 poc1 kernel: [ 1726.317266] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4474 May 27 10:44:14 poc1 kernel: [ 1726.317417] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4475 May 27 10:44:14 poc1 kernel: [ 1726.320303] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4476 May 27 10:44:14 poc1 kernel: [ 1726.321301] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4477 May 27 10:44:14 poc1 kernel: [ 1726.323264] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4478 May 27 10:44:14 poc1 kernel: [ 1726.323299] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4479 May 27 10:44:14 poc1 kernel: [ 1726.324307] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4480 May 27 10:44:14 poc1 kernel: [ 1726.327271] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4481 May 27 10:44:14 poc1 kernel: [ 1726.327301] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4482 May 27 10:44:14 poc1 kernel: [ 1726.329274] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4483 May 27 10:44:14 poc1 kernel: [ 1726.330306] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4484 May 27 10:44:14 poc1 kernel: [ 1726.331273] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4485 May 27 10:44:14 poc1 kernel: [ 1726.333270] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4486 May 27 10:44:14 poc1 kernel: [ 1726.333313] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4487 May 27 10:44:14 poc1 kernel: [ 1726.338276] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4488 May 27 10:44:14 poc1 kernel: [ 1726.338316] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4489 May 27 10:44:14 poc1 kernel: [ 1726.340320] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4490 May 27 10:44:14 poc1 kernel: [ 1726.341313] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4491 May 27 10:44:14 poc1 kernel: [ 1726.342286] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4492 May 27 10:44:14 poc1 kernel: [ 1726.343312] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4493 May 27 10:44:14 poc1 kernel: [ 1726.346324] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4494 May 27 10:44:14 poc1 kernel: [ 1726.347283] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4495 May 27 10:44:14 poc1 kernel: [ 1726.350323] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4496 May 27 10:44:14 poc1 kernel: [ 1726.350326] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4497 May 27 10:44:14 poc1 kernel: [ 1726.352288] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4498 May 27 10:44:14 poc1 kernel: [ 1726.352289] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4499 May 27 10:44:14 poc1 kernel: [ 1726.352942] host7: fcp: 00061e:
xid 01ee-0c48: DDP I/O in fc_fcp_recv_data set ERROR
4500 May 27 10:44:14 poc1 kernel: [ 1726.352946] host7: xid 1ee:
f_ctl 90000 seq 1
4501 May 27 10:44:14 poc1 kernel: [ 1726.353288] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
4502 May 27 10:44:14 poc1 kernel: [ 1726.353320] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
I see much less Exchange timer messages than last time. Here are all
the instances of them:
6521 May 27 10:44:46 poc1 kernel: [ 1757.762301] host7: fcp: 00061e:
xid 0181-090d: DDP I/O in fc_fcp_recv_data set ERROR
6522 May 27 10:44:46 poc1 kernel: [ 1757.762306] host7: xid 181:
f_ctl 90000 seq 1
6523 May 27 10:44:46 poc1 kernel: [ 1757.763257] host7: fcp: 00061e:
xid 0005-0327: DDP I/O in fc_fcp_recv_data set ERROR
6524 May 27 10:44:46 poc1 kernel: [ 1757.763261] host7: xid 5:
f_ctl 90000 seq 1
6525 May 27 10:44:46 poc1 kernel: [ 1757.763347] host7: fcp: 00061e:
xid 0060-0dce: DDP I/O in fc_fcp_recv_data set ERROR
6526 May 27 10:44:46 poc1 kernel: [ 1757.763351] host7: xid 60:
f_ctl 90000 seq 1
6527 May 27 10:44:46 poc1 kernel: [ 1757.763617] host7: xid 12c:
exch: BLS rctl 84 - BLS accept
6528 May 27 10:44:46 poc1 kernel: [ 1757.763627] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED
6529 May 27 10:44:46 poc1 kernel: [ 1757.763632] host7: xid 12c:
Exchange timer armed : 10000 msecs
6530 May 27 10:44:46 poc1 kernel: [ 1757.763813] host7: fcp: 00061e:
xid 00cc-090d: DDP I/O in fc_fcp_recv_data set ERROR
6531 May 27 10:44:46 poc1 kernel: [ 1757.763817] host7: xid cc:
f_ctl 90000 seq 1
6532 May 27 10:44:46 poc1 kernel: [ 1757.763983] host7: xid 60: BLS
rctl 85 - BLS reject received
6533 May 27 10:44:46 poc1 kernel: [ 1757.765447] host7: xid 14d: BLS
rctl 85 - BLS reject received
6534 May 27 10:44:46 poc1 kernel: [ 1757.766258] host7: xid 18b: BLS
rctl 85 - BLS reject received
6535 May 27 10:44:46 poc1 kernel: [ 1757.767116] host7: xid ed: BLS
rctl 85 - BLS reject received
6536 May 27 10:44:46 poc1 kernel: [ 1757.767144] host7: xid 21: BLS
rctl 85 - BLS reject received
6537 May 27 10:44:46 poc1 kernel: [ 1757.768161] host7: xid 122: BLS
rctl 85 - BLS reject received
6538 May 27 10:44:46 poc1 kernel: [ 1757.768184] host7: xid 45: BLS
rctl 85 - BLS reject received
6539 May 27 10:44:46 poc1 kernel: [ 1757.768437] host7: xid 16f: BLS
rctl 85 - BLS reject received
6540 May 27 10:44:46 poc1 kernel: [ 1757.768880] host7: xid 14c: BLS
rctl 85 - BLS reject received
6541 May 27 10:44:46 poc1 kernel: [ 1757.772813] host7: xid 8a: BLS
rctl 85 - BLS reject received
6542 May 27 10:44:46 poc1 kernel: [ 1757.778209] host7: xid 181: BLS
rctl 85 - BLS reject received
6543 May 27 10:44:46 poc1 kernel: [ 1757.778226] host7: xid 16b: BLS
rctl 85 - BLS reject received
6544 May 27 10:44:46 poc1 kernel: [ 1757.778905] host7: xid cc: BLS
rctl 85 - BLS reject received
6545 May 27 10:44:56 poc1 kernel: [ 1767.793465] host7: xid 12c:
Exchange timed out
6546 May 27 10:44:56 poc1 kernel: [ 1767.793478] host7: xid e65:
Exchange timer armed : 2000 msecs
6547 May 27 10:44:56 poc1 kernel: [ 1767.793861] host7: xid e65:
Exchange timer canceled
6548 May 27 10:44:56 poc1 kernel: [ 1767.793866] host7: xid 12c:
LS_RJT for RRQ
6549 May 27 10:45:16 poc1 kernel: [ 1788.033856] host7: scsi:
Resetting rport (00061e)
6550 May 27 10:45:16 poc1 kernel: [ 1788.034071] host7: scsi: lun
reset to lun 0 completed
and
7468 May 27 10:45:17 poc1 kernel: [ 1789.288870] host7: fcp: 00061e:
xid 06ae-0985: fsp=ffff880c0e176700:lso:blen=1000 lso_max=0x10000
t_blen=1000
7469 May 27 10:45:17 poc1 kernel: [ 1789.288874] host7: xid 6ae:
f_ctl 90000 seq 1
7470 May 27 10:45:17 poc1 kernel: [ 1789.291411] host7: fcp: 00061e:
xid 068e-07a3: fsp=ffff880c0e177400:lso:blen=1000 lso_max=0x10000
t_blen=1000
7471 May 27 10:45:17 poc1 kernel: [ 1789.291415] host7: xid 68e:
f_ctl 90000 seq 1
7472 May 27 10:45:17 poc1 kernel: [ 1789.291425] host7: fcp: 00061e:
xid 0c2e-0b04: fsp=ffff880c1cc4c800:lso:blen=1000 lso_max=0x10000
t_blen=1000
7473 May 27 10:45:17 poc1 kernel: [ 1789.291428] host7: xid c2e:
f_ctl 90000 seq 1
7474 May 27 10:45:17 poc1 kernel: [ 1789.296282] host7: xid 1c4: BLS
rctl 85 - BLS reject received
7475 May 27 10:45:17 poc1 kernel: [ 1789.296284] host7: xid 8c: BLS
rctl 85 - BLS reject received
7476 May 27 10:45:17 poc1 kernel: [ 1789.297554] host7: xid 6ae:
exch: BLS rctl 84 - BLS accept
7477 May 27 10:45:17 poc1 kernel: [ 1789.297559] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED
7478 May 27 10:45:17 poc1 kernel: [ 1789.297564] host7: xid 6ae:
Exchange timer armed : 10000 msecs
7479 May 27 10:45:17 poc1 kernel: [ 1789.298873] host7: xid ef: BLS
rctl 85 - BLS reject received
7480 May 27 10:45:17 poc1 kernel: [ 1789.298887] host7: fcp: 00061e:
xid 01cf-0e8e: DDP I/O in fc_fcp_recv_data set ERROR
7481 May 27 10:45:17 poc1 kernel: [ 1789.298891] host7: xid 1cf:
f_ctl 90000 seq 1
7482 May 27 10:45:17 poc1 kernel: [ 1789.299318] host7: xid 1cf: BLS
rctl 85 - BLS reject received
7483 May 27 10:45:17 poc1 kernel: [ 1789.299390] host7: fcp: 00061e:
xid 00c4-048c: DDP I/O in fc_fcp_recv_data set ERROR
7484 May 27 10:45:17 poc1 kernel: [ 1789.299394] host7: xid c4:
f_ctl 90000 seq 1
7485 May 27 10:45:17 poc1 kernel: [ 1789.299716] host7: xid c4: BLS
rctl 85 - BLS reject received
7486 May 27 10:45:17 poc1 kernel: [ 1789.299733] host7: xid e6: BLS
rctl 85 - BLS reject received
7487 May 27 10:45:17 poc1 kernel: [ 1789.299762] host7: xid 87: BLS
rctl 85 - BLS reject received
7488 May 27 10:45:17 poc1 kernel: [ 1789.299767] host7: xid 107: BLS
rctl 85 - BLS reject received
7489 May 27 10:45:17 poc1 kernel: [ 1789.303432] host7: fcp: 00061e:
xid 0042-050f: DDP I/O in fc_fcp_recv_data set ERROR
7490 May 27 10:45:17 poc1 kernel: [ 1789.303437] host7: xid 42:
f_ctl 90000 seq 1
7491 May 27 10:45:17 poc1 kernel: [ 1789.303813] host7: xid 42: BLS
rctl 85 - BLS reject received
7492 May 27 10:45:17 poc1 kernel: [ 1789.304596] host7: xid 7: BLS
rctl 85 - BLS reject received
7493 May 27 10:45:17 poc1 kernel: [ 1789.304607] host7: xid 1a9:
exch: BLS rctl 84 - BLS accept
7494 May 27 10:45:17 poc1 kernel: [ 1789.304615] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_CMD_ABORTED
7495 May 27 10:45:17 poc1 kernel: [ 1789.306357] host7: xid 10d: BLS
rctl 85 - BLS reject received
7496 May 27 10:45:17 poc1 kernel: [ 1789.306797] host7: xid 18e: BLS
rctl 85 - BLS reject received
7497 May 27 10:45:17 poc1 kernel: [ 1789.307196] host7: fcp: 00061e:
xid 01e4-04a7: DDP I/O in fc_fcp_recv_data set ERROR
7498 May 27 10:45:17 poc1 kernel: [ 1789.307201] host7: xid 1e4:
f_ctl 90000 seq 1
7499 May 27 10:45:17 poc1 kernel: [ 1789.307628] host7: xid 1e4: BLS
rctl 85 - BLS reject received
7500 May 27 10:45:17 poc1 kernel: [ 1789.311225] host7: xid 8d: BLS
rctl 85 - BLS reject received
7501 May 27 10:45:17 poc1 kernel: [ 1789.316454] host7: fcp: 00061e:
xid 0c64-0de8: fsp=ffff88061c238c00:lso:blen=1000 lso_max=0x10000
t_blen=1000
7502 May 27 10:45:17 poc1 kernel: [ 1789.316458] host7: xid c64:
f_ctl 90000 seq 1
7503 May 27 10:45:27 poc1 kernel: [ 1799.306713] host7: xid 6ae:
Exchange timed out
7504 May 27 10:45:27 poc1 kernel: [ 1799.306725] host7: xid e65:
Exchange timer armed : 2000 msecs
7505 May 27 10:45:29 poc1 kernel: [ 1801.312254] host7: xid e65:
Exchange timed out
7506 May 27 10:45:29 poc1 kernel: [ 1801.312261] host7: xid e65:
f_ctl 90000 seq 1
7507 May 27 10:45:29 poc1 kernel: [ 1801.312264] host7: xid e65:
Exchange timer armed : 20000 msecs
7508 May 27 10:45:31 poc1 kernel: [ 1803.197758] host7: rport 00061e:
Remove port
7509 May 27 10:45:31 poc1 kernel: [ 1803.197762] host7: rport 00061e:
Port sending LOGO from Ready state
and finally
7724 May 27 10:45:31 poc1 kernel: [ 1803.199215] host7: fcp: 00061e:
Returning DID_ERROR to scsi-ml due to FC_DATA_UNDRUN (scsi)
7725 May 27 10:45:31 poc1 kernel: [ 1803.199219] host7: xid c44:
f_ctl 90000 seq 1
7726 May 27 10:45:31 poc1 kernel: [ 1803.199223] host7: rport 00061e:
Received a LOGO response closed
7727 May 27 10:45:31 poc1 kernel: [ 1803.199226] host7: xid e65:
Exchange timer canceled
7728 May 27 10:45:31 poc1 kernel: [ 1803.199264] host7: rport 00061e:
work delete
7729 May 27 10:46:31 poc1 kernel: [ 1863.357830] rport-7:0-0:
blocked FC remote port time out: removing target and saving binding
The messages on the target side are similar as before.
This is when Exhange timer message first appears:
18732865 May 27 10:44:14 poc2 kernel: [ 1909.030408] host7: xid c48:
f_ctl 800000 seq 1
18732866 May 27 10:44:14 poc2 kernel: [ 1909.030423] host7: xid c48:
f_ctl 880008 seq 2
18732867 May 27 10:44:14 poc2 kernel: [ 1909.031509] host7: xid 766:
f_ctl 800000 seq 1
18732868 May 27 10:44:14 poc2 kernel: [ 1909.031528] host7: xid 766:
f_ctl 880008 seq 2
18732869 May 27 10:44:14 poc2 kernel: [ 1909.031787] host7: xid cce:
f_ctl 800000 seq 1
18732870 May 27 10:44:14 poc2 kernel: [ 1909.031799] host7: xid cce:
f_ctl 880008 seq 2
18732871 May 27 10:44:14 poc2 kernel: [ 1909.031944] host7: xid 2c7:
f_ctl 800000 seq 1
18732872 May 27 10:44:14 poc2 kernel: [ 1909.031971] host7: xid 2c7:
f_ctl 880008 seq 2
18732873 May 27 10:44:14 poc2 kernel: [ 1909.032145] host7: xid f09:
f_ctl 800000 seq 1
18732874 May 27 10:44:14 poc2 kernel: [ 1909.032166] host7: xid f09:
f_ctl 880008 seq 2
18732875 May 27 10:44:14 poc2 kernel: [ 1909.032374] host7: xid ce0:
f_ctl 800000 seq 1
18732876 May 27 10:44:14 poc2 kernel: [ 1909.032383] host7: xid ce0:
f_ctl 880008 seq 2
18732877 May 27 10:44:14 poc2 kernel: [ 1909.032729] host7: xid 8eb:
f_ctl 800000 seq 1
18732878 May 27 10:44:14 poc2 kernel: [ 1909.032736] host7: xid 32f:
f_ctl 800000 seq 1
18732879 May 27 10:44:14 poc2 kernel: [ 1909.032740] host7: xid 8eb:
f_ctl 880008 seq 2
18732880 May 27 10:44:14 poc2 kernel: [ 1909.032759] host7: xid 32f:
f_ctl 880008 seq 2
18732881 May 27 10:44:14 poc2 kernel: [ 1909.033034] host7: xid 98d:
f_ctl 800000 seq 1
18732882 May 27 10:44:14 poc2 kernel: [ 1909.033051] host7: xid 98d:
f_ctl 880008 seq 2
18732883 May 27 10:44:14 poc2 kernel: [ 1909.033262] host7: xid c01:
f_ctl 800000 seq 1
18732884 May 27 10:44:14 poc2 kernel: [ 1909.033290] host7: xid c01:
f_ctl 880008 seq 2
18732885 May 27 10:44:14 poc2 kernel: [ 1909.033373] host7: xid 6ea:
f_ctl 800000 seq 1
18732886 May 27 10:44:14 poc2 kernel: [ 1909.033387] host7: xid 6ea:
f_ctl 880008 seq 2
18732887 May 27 10:44:14 poc2 kernel: [ 1909.033734] host7: xid 766:
f_ctl 800000 seq 1
18732888 May 27 10:44:14 poc2 kernel: [ 1909.033747] host7: xid 766:
f_ctl 880008 seq 2
18732889 May 27 10:44:14 poc2 kernel: [ 1909.034621] host7: xid 30c:
f_ctl 800000 seq 1
18732890 May 27 10:44:14 poc2 kernel: [ 1909.034648] host7: xid 30c:
f_ctl 880008 seq 2
18732891 May 27 10:44:14 poc2 kernel: [ 1909.035309] host7: xid c48:
Exchange timer armed : 0 msecs
18732892 May 27 10:44:14 poc2 kernel: [ 1909.035315] host7: xid c48:
f_ctl 800000 seq 1
18732893 May 27 10:44:14 poc2 kernel: [ 1909.035384] host7: xid c48:
Exchange timed out
This is when ft_queue_data_in failure first appears:
18919838 May 27 10:44:46 poc2 kernel: [ 1940.431788] host7: xid 327:
f_ctl 800000 seq 1
18919839 May 27 10:44:46 poc2 kernel: [ 1940.431801] host7: xid 327:
f_ctl 880008 seq 2
18919840 May 27 10:44:46 poc2 kernel: [ 1940.433931] host7: xid d68:
f_ctl 800000 seq 1
18919841 May 27 10:44:46 poc2 kernel: [ 1940.433951] host7: xid d68:
f_ctl 880008 seq 2
18919842 May 27 10:44:46 poc2 kernel: [ 1940.436138] host7: xid f89:
f_ctl 800000 seq 1
18919843 May 27 10:44:46 poc2 kernel: [ 1940.436174] host7: xid f89:
f_ctl 880008 seq 2
18919844 May 27 10:44:46 poc2 kernel: [ 1940.436889] host7: xid 74a:
f_ctl 800000 seq 1
18919845 May 27 10:44:46 poc2 kernel: [ 1940.436912] host7: xid 74a:
f_ctl 880008 seq 2
18919846 May 27 10:44:46 poc2 kernel: [ 1940.437281] host7: xid 40c:
f_ctl 800000 seq 1
18919847 May 27 10:44:46 poc2 kernel: [ 1940.437326] host7: xid 40c:
f_ctl 880008 seq 2
18919848 May 27 10:44:46 poc2 kernel: [ 1940.437338] host7: xid 92b:
f_ctl 800000 seq 1
18919849 May 27 10:44:46 poc2 kernel: [ 1940.437356] host7: xid 92b:
f_ctl 880008 seq 2
18919850 May 27 10:44:46 poc2 kernel: [ 1940.439508] host7: xid dce:
f_ctl 800000 seq 1
18919851 May 27 10:44:46 poc2 kernel: [ 1940.439539] host7: xid dce:
f_ctl 880008 seq 2
18919852 May 27 10:44:46 poc2 kernel: [ 1940.439551] host7: xid 42f:
f_ctl 800000 seq 1
18919853 May 27 10:44:46 poc2 kernel: [ 1940.439559] host7: xid 42f:
f_ctl 880008 seq 2
18919854 May 27 10:44:46 poc2 kernel: [ 1940.440408] host7: xid 90d:
f_ctl 800000 seq 1
18919855 May 27 10:44:46 poc2 kernel: [ 1940.440425] host7: xid 90d:
f_ctl 880008 seq 2
18919856 May 27 10:44:46 poc2 kernel: [ 1940.440930] host7: xid d60:
f_ctl 800000 seq 1
18919857 May 27 10:44:46 poc2 kernel: [ 1940.440950] host7: xid d60:
f_ctl 880008 seq 2
18919858 May 27 10:44:46 poc2 kernel: [ 1940.440971] host7: xid ac4:
f_ctl 800000 seq 1
18919859 May 27 10:44:46 poc2 kernel: [ 1940.440983] host7: xid ac4:
f_ctl 880008 seq 2
18919860 May 27 10:44:46 poc2 kernel: [ 1940.441482] host7: xid fc2:
f_ctl 800000 seq 1
18919861 May 27 10:44:46 poc2 kernel: [ 1940.441504] host7: xid fc2:
f_ctl 880008 seq 2
18919862 May 27 10:44:46 poc2 kernel: [ 1940.441759] host7: xid 885:
f_ctl 800000 seq 1
18919863 May 27 10:44:46 poc2 kernel: [ 1940.441772] host7: xid 885:
f_ctl 880008 seq 2
18919864 May 27 10:44:46 poc2 kernel: [ 1940.441883] host7: xid ce1:
f_ctl 800000 seq 1
18919865 May 27 10:44:46 poc2 kernel: [ 1940.441909] host7: xid ce1:
f_ctl 880008 seq 2
18919866 May 27 10:44:46 poc2 kernel: [ 1940.443033] host7: xid 7a3:
f_ctl 800000 seq 1
18919867 May 27 10:44:46 poc2 kernel: [ 1940.443063] host7: xid 7a3:
f_ctl 880008 seq 2
18919868 May 27 10:44:46 poc2 kernel: [ 1940.445410] host7: xid 7e6:
f_ctl 800000 seq 1
18919869 May 27 10:44:46 poc2 kernel: [ 1940.445424] host7: xid 7e6:
f_ctl 880008 seq 2
18919870 May 27 10:44:46 poc2 kernel: [ 1940.445609] host7: xid 327:
f_ctl 800000 seq 1
18919871 May 27 10:44:46 poc2 kernel: [ 1940.445621] host7: xid 327:
Exchange timer armed : 0 msecs
18919872 May 27 10:44:46 poc2 kernel: [ 1940.445630] host7: xid 327:
f_ctl 800008 seq 2
18919873 May 27 10:44:46 poc2 kernel: [ 1940.445639] ft_queue_data_in:
Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 131072,
lso_max <0x10000>
18919874 May 27 10:44:46 poc2 kernel: [ 1940.445645] ft_queue_data_in:
Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 65536,
lso_max <0x10000>
18919875 May 27 10:44:46 poc2 kernel: [ 1940.445649] ft_queue_data_in:
Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 0,
lso_max <0x10000>
After stripping out the "host7: xid xxx: f_ctl 8x0000 seq x" lines,
the messages on the target look like:
15476415:May 27 10:42:37 poc2 kernel: [ 1811.571399] perf samples too
long (2670 > 2500), lowering kernel.perf_event_max_sample_rate to
50000
18732891:May 27 10:44:14 poc2 kernel: [ 1909.035309] host7: xid c48:
Exchange timer armed : 0 msecs
18732893:May 27 10:44:14 poc2 kernel: [ 1909.035384] host7: xid c48:
Exchange timed out
18734191:May 27 10:44:14 poc2 kernel: [ 1909.218580] host7: xid 766:
Exchange timer armed : 0 msecs
18734193:May 27 10:44:14 poc2 kernel: [ 1909.218750] host7: xid 766:
Exchange timed out
18734419:May 27 10:44:14 poc2 kernel: [ 1909.254362] host7: xid cce:
Exchange timer armed : 0 msecs
18734422:May 27 10:44:14 poc2 kernel: [ 1909.254385] host7: xid cce:
Exchange timed out
18734439:May 27 10:44:14 poc2 kernel: [ 1909.258190] host7: xid a84:
Exchange timer armed : 0 msecs
18734441:May 27 10:44:14 poc2 kernel: [ 1909.258245] host7: xid a84:
Exchange timed out
18734444:May 27 10:44:14 poc2 kernel: [ 1909.259212] host7: xid d00:
Exchange timer armed : 0 msecs
18734446:May 27 10:44:14 poc2 kernel: [ 1909.259222] host7: xid d00:
Exchange timed out
18734465:May 27 10:44:14 poc2 kernel: [ 1909.261859] host7: xid 287:
Exchange timer armed : 0 msecs
18734467:May 27 10:44:14 poc2 kernel: [ 1909.261910] host7: xid 287:
Exchange timed out
18734632:May 27 10:44:15 poc2 kernel: [ 1909.285827] host7: xid a64:
Exchange timer armed : 0 msecs
18734634:May 27 10:44:15 poc2 kernel: [ 1909.285892] host7: xid a64:
Exchange timed out
18734699:May 27 10:44:15 poc2 kernel: [ 1909.301348] host7: xid be8:
Exchange timer armed : 0 msecs
18734701:May 27 10:44:15 poc2 kernel: [ 1909.301357] host7: xid be8:
Exchange timed out
18734806:May 27 10:44:15 poc2 kernel: [ 1909.319404] host7: xid ea9:
Exchange timer armed : 0 msecs
18734808:May 27 10:44:15 poc2 kernel: [ 1909.319446] host7: xid ea9:
Exchange timed out
18734842:May 27 10:44:15 poc2 kernel: [ 1909.324226] host7: xid c01:
Exchange timer armed : 0 msecs
18734845:May 27 10:44:15 poc2 kernel: [ 1909.324257] host7: xid c01:
Exchange timed out
18734944:May 27 10:44:15 poc2 kernel: [ 1909.347075] host7: xid 30c:
Exchange timer armed : 0 msecs
18734946:May 27 10:44:15 poc2 kernel: [ 1909.347099] host7: xid 30c:
Exchange timed out
18735067:May 27 10:44:15 poc2 kernel: [ 1909.360509] host7: xid c8e:
Exchange timer armed : 0 msecs
18735069:May 27 10:44:15 poc2 kernel: [ 1909.360571] host7: xid c8e:
Exchange timed out
18735154:May 27 10:44:15 poc2 kernel: [ 1909.380932] host7: xid 9ad:
Exchange timer armed : 0 msecs
18735156:May 27 10:44:15 poc2 kernel: [ 1909.380946] host7: xid 9ad:
Exchange timed out
18735253:May 27 10:44:15 poc2 kernel: [ 1909.395037] host7: xid ce0:
Exchange timer armed : 0 msecs
18735255:May 27 10:44:15 poc2 kernel: [ 1909.395113] host7: xid ce0:
Exchange timed out
18735308:May 27 10:44:15 poc2 kernel: [ 1909.405266] host7: xid 746:
Exchange timer armed : 0 msecs
18735312:May 27 10:44:15 poc2 kernel: [ 1909.405332] host7: xid 746:
Exchange timed out
18735317:May 27 10:44:15 poc2 kernel: [ 1909.405832] host7: xid 26c:
Exchange timer armed : 0 msecs
18735319:May 27 10:44:15 poc2 kernel: [ 1909.405852] host7: xid 26c:
Exchange timed out
18735342:May 27 10:44:15 poc2 kernel: [ 1909.412982] host7: xid f09:
Exchange timer armed : 0 msecs
18735344:May 27 10:44:15 poc2 kernel: [ 1909.413062] host7: xid f09:
Exchange timed out
18735572:May 27 10:44:15 poc2 kernel: [ 1909.453616] host7: xid d2e:
Exchange timer armed : 0 msecs
18735574:May 27 10:44:15 poc2 kernel: [ 1909.453631] host7: xid d2e:
Exchange timed out
18735575:May 27 10:44:15 poc2 kernel: [ 1909.454206] host7: xid 32f:
Exchange timer armed : 0 msecs
18735577:May 27 10:44:15 poc2 kernel: [ 1909.454218] host7: xid 32f:
Exchange timed out
18735810:May 27 10:44:15 poc2 kernel: [ 1909.490061] host7: xid c68:
Exchange timer armed : 0 msecs
18735812:May 27 10:44:15 poc2 kernel: [ 1909.490126] host7: xid c68:
Exchange timed out
18735927:May 27 10:44:15 poc2 kernel: [ 1909.515654] host7: xid 267:
Exchange timer armed : 0 msecs
18735929:May 27 10:44:15 poc2 kernel: [ 1909.515667] host7: xid 267:
Exchange timed out
18735940:May 27 10:44:15 poc2 kernel: [ 1909.523278] host7: xid 3cf:
Exchange timer armed : 0 msecs
18735942:May 27 10:44:15 poc2 kernel: [ 1909.523350] host7: xid 3cf:
Exchange timed out
18736041:May 27 10:44:15 poc2 kernel: [ 1909.554439] host7: xid 984:
Exchange timer armed : 0 msecs
18736043:May 27 10:44:15 poc2 kernel: [ 1909.554457] host7: xid 984:
Exchange timed out
18736148:May 27 10:44:15 poc2 kernel: [ 1909.583305] host7: xid 7a6:
Exchange timer armed : 0 msecs
18736150:May 27 10:44:15 poc2 kernel: [ 1909.583332] host7: xid 7a6:
Exchange timed out
18916476:May 27 10:44:45 poc2 kernel: [ 1939.812996] host7: xid 2c7:
Exchange timer armed : 0 msecs
18916478:May 27 10:44:45 poc2 kernel: [ 1939.813075] host7: xid 2c7:
Exchange timed out
18917869:May 27 10:44:45 poc2 kernel: [ 1940.028716] host7: xid 307:
Exchange timer armed : 0 msecs
18917871:May 27 10:44:45 poc2 kernel: [ 1940.028738] host7: xid 307:
Exchange timed out
18917904:May 27 10:44:45 poc2 kernel: [ 1940.036022] host7: xid 44f:
Exchange timer armed : 0 msecs
18917906:May 27 10:44:45 poc2 kernel: [ 1940.036072] host7: xid 44f:
Exchange timed out
18917907:May 27 10:44:45 poc2 kernel: [ 1940.036419] host7: xid d01:
Exchange timer armed : 0 msecs
18917910:May 27 10:44:45 poc2 kernel: [ 1940.036459] host7: xid d01:
Exchange timed out
18918008:May 27 10:44:45 poc2 kernel: [ 1940.050544] host7: xid 94b:
Exchange timer armed : 0 msecs
18918010:May 27 10:44:45 poc2 kernel: [ 1940.050553] host7: xid 94b:
Exchange timed out
18918239:May 27 10:44:45 poc2 kernel: [ 1940.100469] host7: xid 743:
Exchange timer armed : 0 msecs
18918241:May 27 10:44:45 poc2 kernel: [ 1940.100485] host7: xid 743:
Exchange timed out
18918310:May 27 10:44:45 poc2 kernel: [ 1940.111206] host7: xid f82:
Exchange timer armed : 0 msecs
18918312:May 27 10:44:45 poc2 kernel: [ 1940.111357] host7: xid f82:
Exchange timed out
18918329:May 27 10:44:45 poc2 kernel: [ 1940.117376] host7: xid e00:
Exchange timer armed : 0 msecs
18918331:May 27 10:44:45 poc2 kernel: [ 1940.117388] host7: xid e00:
Exchange timed out
18918385:May 27 10:44:45 poc2 kernel: [ 1940.127422] host7: xid d08:
Exchange timer armed : 0 msecs
18918388:May 27 10:44:45 poc2 kernel: [ 1940.127470] host7: xid d08:
Exchange timed out
18918487:May 27 10:44:45 poc2 kernel: [ 1940.142935] host7: xid 8c5:
Exchange timer armed : 0 msecs
18918489:May 27 10:44:45 poc2 kernel: [ 1940.142950] host7: xid 8c5:
Exchange timed out
18918588:May 27 10:44:45 poc2 kernel: [ 1940.158196] host7: xid 92d:
Exchange timer armed : 0 msecs
18918590:May 27 10:44:45 poc2 kernel: [ 1940.158254] host7: xid 92d:
Exchange timed out
18918597:May 27 10:44:45 poc2 kernel: [ 1940.159082] host7: xid a04:
Exchange timer armed : 0 msecs
18918599:May 27 10:44:45 poc2 kernel: [ 1940.159154] host7: xid a04:
Exchange timed out
18918866:May 27 10:44:45 poc2 kernel: [ 1940.209430] host7: xid d28:
Exchange timer armed : 0 msecs
18918868:May 27 10:44:45 poc2 kernel: [ 1940.209484] host7: xid d28:
Exchange timed out
18918909:May 27 10:44:45 poc2 kernel: [ 1940.215142] host7: xid 8ed:
Exchange timer armed : 0 msecs
18918911:May 27 10:44:45 poc2 kernel: [ 1940.215158] host7: xid 8ed:
Exchange timed out
18918918:May 27 10:44:45 poc2 kernel: [ 1940.217253] host7: xid 3ef:
Exchange timer armed : 0 msecs
18918920:May 27 10:44:45 poc2 kernel: [ 1940.217270] host7: xid 3ef:
Exchange timed out
18919025:May 27 10:44:45 poc2 kernel: [ 1940.241859] host7: xid fa9:
Exchange timer armed : 0 msecs
18919027:May 27 10:44:45 poc2 kernel: [ 1940.241904] host7: xid fa9:
Exchange timed out
18919151:May 27 10:44:45 poc2 kernel: [ 1940.267514] host7: xid 38c:
Exchange timer armed : 0 msecs
18919154:May 27 10:44:45 poc2 kernel: [ 1940.267580] host7: xid 38c:
Exchange timed out
18919163:May 27 10:44:45 poc2 kernel: [ 1940.268681] host7: xid d48:
Exchange timer armed : 0 msecs
18919165:May 27 10:44:45 poc2 kernel: [ 1940.268707] host7: xid d48:
Exchange timed out
18919212:May 27 10:44:45 poc2 kernel: [ 1940.278834] host7: xid aa4:
Exchange timer armed : 0 msecs
18919214:May 27 10:44:45 poc2 kernel: [ 1940.278881] host7: xid aa4:
Exchange timed out
18919243:May 27 10:44:45 poc2 kernel: [ 1940.285614] host7: xid f22:
Exchange timer armed : 0 msecs
18919245:May 27 10:44:45 poc2 kernel: [ 1940.285629] host7: xid f22:
Exchange timed out
18919270:May 27 10:44:45 poc2 kernel: [ 1940.293297] host7: xid da0:
Exchange timer armed : 0 msecs
18919272:May 27 10:44:45 poc2 kernel: [ 1940.293346] host7: xid da0:
Exchange timed out
18919321:May 27 10:44:45 poc2 kernel: [ 1940.301098] host7: xid 3cc:
Exchange timer armed : 0 msecs
18919325:May 27 10:44:45 poc2 kernel: [ 1940.301172] host7: xid 3cc:
Exchange timed out
18919472:May 27 10:44:46 poc2 kernel: [ 1940.333266] host7: xid e0e:
Exchange timer armed : 0 msecs
18919474:May 27 10:44:46 poc2 kernel: [ 1940.333299] host7: xid e0e:
Exchange timed out
18919641:May 27 10:44:46 poc2 kernel: [ 1940.368427] host7: xid 2e7:
Exchange timer armed : 0 msecs
18919643:May 27 10:44:46 poc2 kernel: [ 1940.368451] host7: xid 2e7:
Exchange timed out
18919694:May 27 10:44:46 poc2 kernel: [ 1940.386699] host7: xid 826:
Exchange timer armed : 0 msecs
18919696:May 27 10:44:46 poc2 kernel: [ 1940.386744] host7: xid 826:
Exchange timed out
18919785:May 27 10:44:46 poc2 kernel: [ 1940.409090] host7: xid 40f:
Exchange timer armed : 0 msecs
18919787:May 27 10:44:46 poc2 kernel: [ 1940.409125] host7: xid 40f:
Exchange timed out
18919871:May 27 10:44:46 poc2 kernel: [ 1940.445621] host7: xid 327:
Exchange timer armed : 0 msecs
18919873:May 27 10:44:46 poc2 kernel: [ 1940.445639] ft_queue_data_in:
Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 131072,
lso_max <0x10000>
18919874:May 27 10:44:46 poc2 kernel: [ 1940.445645] ft_queue_data_in:
Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 65536,
lso_max <0x10000>
18919875:May 27 10:44:46 poc2 kernel: [ 1940.445649] ft_queue_data_in:
Failed to send frame ffff88062cbd3a00, xid <0x327>, remaining 0,
lso_max <0x10000>
18919877:May 27 10:44:46 poc2 kernel: [ 1940.445671] host7: xid 327:
Exchange timed out
18919878:May 27 10:44:46 poc2 kernel: [ 1940.445683] host7: xid ac4:
Exchange timer armed : 2000 msecs
18930794:May 27 10:44:48 poc2 kernel: [ 1942.452127] host7: xid ac4:
Exchange timed out
18930795:May 27 10:44:48 poc2 kernel: [ 1942.452136] host7: xid ac4:
f_ctl 90000 seq 1
18930796:May 27 10:44:48 poc2 kernel: [ 1942.452139] host7: xid ac4:
Exchange timer armed : 20000 msecs
Thanks
Post by Vasu DevPost by Jun WuPost by Jun Wu18630751 May 21 11:01:53 poc2 kernel: [ 1528.191740] host7: xid
Exchange timer armed : 0 msecs
18630752 May 21 11:01:53 poc2 kernel: [ 1528.191747] host7: xid
f_ctl 800000 seq 1
18630753 May 21 11:01:53 poc2 kernel: [ 1528.191756] host7: xid
f_ctl 800000 seq 2
18630754 May 21 11:01:53 poc2 kernel: [ 1528.191763] host7: xid
Exchange timed out
18630755 May 21 11:01:53 poc2 kernel: [ 1528.191777]
Failed to send frame ffff8805bf245e00, xid <0x6e4>, remaining
458752,
Post by Jun Wulso_max <0x10000>
Exchange 0x6e4 is aborted and then target still sending frame, while
later should not occur but first setting up abort with 0 msec timeout
doesn not look correct either and it is different that 8000 ms on
initiator side. Are you using same switch between initiator and target
with DCB enabled ?
Reducing retries could narrow down to early aborts is the cause here,
can you try with REC disabled on initiator side for that using this
change ?
diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c
index 1d7e76e..43fe793 100644
--- a/drivers/scsi/libfc/fc_fcp.c
+++ b/drivers/scsi/libfc/fc_fcp.c
@@ -1160,6 +1160,7 @@ static int fc_fcp_cmd_send(struct fc_lport *lport,
struct fc_fcp_pkt *fsp,
fc_fcp_pkt_hold(fsp); /* hold for fc_fcp_pkt_destroy */
setup_timer(&fsp->timer, fc_fcp_timeout, (unsigned long)fsp);
+ rpriv->flags &= ~FC_RP_FLAGS_REC_SUPPORTED;
if (rpriv->flags & FC_RP_FLAGS_REC_SUPPORTED)
fc_fcp_timer_set(fsp, get_fsp_rec_tov(fsp));