Discussion:
[TrouSerS-users] TPM stops responding after call to internal_EvictKeyBySlot
Mike Gerow
2015-08-18 23:03:34 UTC
Permalink
Hi folks,

I initially discussed this issue on the tpmdd-devel list here <
http://sourceforge.net/p/tpmdd/mailman/tpmdd-devel/thread/201401262056.25329.PeterHuewe%40gmx.de/#msg31887814>.
That was a long time ago and we still seem to see the same kind of issue.
Essentially the TPM seems to just completely stop responding on the LPC
bus. In trying to further diagnose the issue I compiled trousers with
debugging mode on and noticed a pattern in the leadup to where the TPM
would completely fall over.

This seems to happen right after we try to evict a key:

Jul 6 19:25:26 <hostname> TCSD TCS[26574]: TrouSerS tcs_key.c:259
internal_EvictByKeySlot: Evicting key using FlushSpecific for TPM 1.2
Jul 6 19:25:26 <hostname> To TPM:[26574]: 00 C1 00 00 00 12 00 00 00 BA 00
E8 F6 0A 00 00
Jul 6 19:25:26 <hostname> To TPM:[26574]: 00 01
Jul 6 19:25:26 <hostname> TCSD TDDL[26574]: TrouSerS tddl.c:171 Calling
write to driver
Jul 6 19:25:29 <hostname> From TPM:[26574]: 00 C4 00 00 00 0A 00 00 00 00
00 00 00 00 00 00
Jul 6 19:25:29 <hostname> From TPM:[26574]: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00
Jul 6 19:25:29 <hostname> From TPM:[26574]: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00
Jul 6 19:25:29 <hostname> From TPM:[26574]: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00
Jul 6 19:25:29 <hostname> From TPM:[26574]: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00
Jul 6 19:25:29 <hostname> From TPM:[26574]: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00
Jul 6 19:25:29 <hostname> From TPM:[26574]: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00
Jul 6 19:25:29 <hostname> From TPM:[26574]: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00
...
<and so on reading 0 from the device>

I'm really at a loss for how to continue diagnosing this, though. It might
be a hardware issue, but even if that's the case it'd be nice to find a way
to try to prevent it from happening as often as it does (currently when the
TPM breaks like this the only course of action users have is to restart
their machine). I can probably provide some more detailed logs if they
would help.

Other info that might prove pertinent:
trousers version: 0.3.11.2
kernel version: 3.13.0-61-generic
tpm_version on affected devices:
TPM 1.2 Version Info:
Chip Version: 1.2.8.32
Spec Level: 2
Errata Revision: 3
TPM Vendor ID: STM
TPM Version: 01010000
Manufacturer Info: 53544d20

Thanks for any advice you can offer.
--
Mike Gerow
***@google.com
Ken Goldman
2015-08-20 14:51:18 UTC
Permalink
It's hard to image in a hardware problem for something as simple as
flushspecific, but you never know. The trace seems to show the response
being received but tddl expecting/reading something more.

To isolate the issue, you might try the IBM TPM utilities at

https://sourceforge.net/projects/ibmswtpm/

This isn't production code, but is useful for debugging. It could
indicate whether the problem is within tcsd or the device driver/TPM.
Post by Mike Gerow
I'm really at a loss for how to continue diagnosing this, though. It
might be a hardware issue, but even if that's the case it'd be nice to
find a way to try to prevent it from happening as often as it does
(currently when the TPM breaks like this the only course of action users
have is to restart their machine). I can probably provide some more
detailed logs if they would help.
------------------------------------------------------------------------------
Mike Gerow
2015-08-20 21:51:55 UTC
Permalink
Thanks for the suggestions! I feel like I should say that out of the many
machines we have that use the TPM only this specific model of STM chip gets
completely stuck like this (and only seems to do so very unreliably). That
said, I don't think we've actually tested our workload for a significant
amount of time using the IBM swtpm. I suppose there is a chance the swtpm
would pick up if trousers does something wrong that the other TPMs just
happen to handle better. I'll go ahead and do that!
Post by Ken Goldman
It's hard to image in a hardware problem for something as simple as
flushspecific, but you never know. The trace seems to show the response
being received but tddl expecting/reading something more.
Yes, I have a feeling if there is an actual problem it's happening much
earlier than where the key is being flushed and we only see the result of
that when it gets flushed. Also, yes, the driver gets very confused when
this happens. If I try to even just read the raw register values of the TPM
from /dev/mem after it appears to break I get all values like 0xFF as if
it's not even asserting anything on the LPC bus.
Post by Ken Goldman
To isolate the issue, you might try the IBM TPM utilities at
https://sourceforge.net/projects/ibmswtpm/
This isn't production code, but is useful for debugging. It could
indicate whether the problem is within tcsd or the device driver/TPM.
Post by Mike Gerow
I'm really at a loss for how to continue diagnosing this, though. It
might be a hardware issue, but even if that's the case it'd be nice to
find a way to try to prevent it from happening as often as it does
(currently when the TPM breaks like this the only course of action users
have is to restart their machine). I can probably provide some more
detailed logs if they would help.
------------------------------------------------------------------------------
_______________________________________________
TrouSerS-users mailing list
https://lists.sourceforge.net/lists/listinfo/trousers-users
--
Mike Gerow
***@google.com
Ken Goldman
2015-08-21 14:24:55 UTC
Permalink
Since this seems to be a communications issue, I was suggesting that you
use the IBM TPM __utilities__ (not the SW TPM).

The utilities are a demo code simplified equivalent to trousers, but you
can easily script them into loops for testing.

It's divide and conquer:

If the utilities work, you can blame trousers.
If the utilities fail, blame the TPM or device driver.
...That said, I don't think we've actually tested our workload
for a significant amount of time using the IBM swtpm. I suppose there is
a chance the swtpm would pick up if trousers does something wrong that
the other TPMs just happen to handle better. I'll go ahead and do that!
To isolate the issue, you might try the IBM TPM utilities at
------------------------------------------------------------------------------
Loading...