Discussion:
[PATCH v10 00/12] Fix the on-flight DMA issue on system with amd iommu
(too old to reply)
Baoquan He
2017-08-09 08:40:01 UTC
Permalink
When kernel panicked and jump into the kdump kernel, DMA started by the
1st kernel is not stopped, this is called on-flight DMA. In the current
code it will disable iommu and build new translation table and attach
device to it. This will cause:

1. IO_PAGE_FAULT warning message can be seen.
2. transfer data to or from incorrect areas of memory.

Sometime it causes the dump failure or kernel hang.

The principle of the fix is to copy the old device table to let the old-flight
DMA continue looking up to get correct address translation and irq remap result,
meanwhile to defer the assignment of device to domain to device driver initializtion
stage. The old domain ids used in 1st kernel are reserved. And a new call-back
is_attach_deferred() is added to iommu-ops, will check whether we need defer the
domain attach/detach in iommu-core code. If defer is needed, just return directly
from amd iommu attach/detach function. The attachment will be done in device driver
initializaiton stage when calling get_domain().

Change history:
v9->v10:
Main changes are as follows according to Joerg's suggestion. The detailed
changes will be added to each patch.
- Drop the old patch 12/13 patch.

- Move the old dev table copy out of iommu loop in copy_dev_tables().

- Add global variable amd_iommu_pre_enabled to optimize code change
when call copy_dev_tables().

v8->v9:
Made changes according to Joerg's reviewing comments and suggestions:
- Check if all IOMMUs are pre-enabled, otherwise do not copy dev table
and just continue as normal kernel does.

- Add a new global old_dev_tbl_cpy to point to a newly allocated device
table. The content of old device table will be copied to the specific
device table for copying which old_dev_tbl_cpy points at. If copy failed
we can still use the amd_iommu_dev_table which is allocated in
early_amd_iommu_init(). This is for better rolling back if copy failed,
the amd_iommu_dev_table has got necessary initialization since iommu init.

- Always allocate device table with GFP_DMA32 flag to make sure that they
are under 4G. This tries to work around the issue mentioned in patch 10/13.
Meanwhile double check if the address of device table is above 4G since
it could be touched accidentally in corrupted 1st kernel and not trustworthy
any more.

v7->v8:
Rebase patchset v7 on the latest v4.13-rc1.
- And re-enable printing IO_PAGE_FAULT message in kdump kernel.

- Only disable iommu if amd_iommu=off is specified in kdump kernel.


v6->v7:
Two main changes are made according to Joerg's suggestion:
- Add is_attach_deferred call-back to iommu-ops. With this domain
can be deferred to device driver init cleanly.

- Allocate memory below 4G for dev table if translation pre-enabled.
AMD engineer pointed out that it's unsafe to update the device-table
while iommu is enabled. device-table pointer update is split up into
two 32bit writes in the IOMMU hardware. So updating it while the IOMMU
is enabled could have some nasty side effects.

v5->v6:
According to Joerg's comments made several below main changes:
- Add sanity check when copy old dev tables.

- If a device is set up with guest translations (DTE.GV=1), then don't
copy that information but move the device over to an empty guest-cr3
table and handle the faults in the PPR log (which just answer them
with INVALID).

v5:
bnx2 NIC can't reset itself during driver init. Post patch to reset
it during driver init. IO_PAGE_FAULT can't be seen anymore.

Below is link of v5 post.
https://lists.linuxfoundation.org/pipermail/iommu/2016-September/018527.html


Baoquan He (12):
iommu/amd: Detect pre enabled translation
iommu/amd: add several helper functions
Revert "iommu/amd: Suppress IO_PAGE_FAULTs in kdump kernel"
iommu/amd: Define bit fields for DTE particularly
iommu/amd: Add function copy_dev_tables()
iommu/amd: copy old trans table from old kernel
iommu/amd: Do sanity check for address translation and irq remap of
old dev table entry
iommu: Add is_attach_deferred call-back to iommu-ops
iommu/amd: Use is_attach_deferred call-back
iommu/amd: Allocate memory below 4G for dev table if translation
pre-enabled
iommu/amd: Don't copy GCR3 table root pointer
iommu/amd: Disable iommu only if amd_iommu=off is specified

drivers/iommu/amd_iommu.c | 65 ++++++------
drivers/iommu/amd_iommu_init.c | 223 +++++++++++++++++++++++++++++++++++-----
drivers/iommu/amd_iommu_proto.h | 2 +
drivers/iommu/amd_iommu_types.h | 55 +++++++++-
drivers/iommu/amd_iommu_v2.c | 18 +++-
drivers/iommu/iommu.c | 8 ++
include/linux/iommu.h | 1 +
7 files changed, 306 insertions(+), 66 deletions(-)
--
2.5.5
Baoquan He
2017-08-09 08:40:01 UTC
Permalink
Add function copy_dev_tables to copy the old DEV table entries of the panicked
kernel to the new allocated device table. Since all iommus share the same device
table the copy only need be done one time. Here add a new global old_dev_tbl_cpy
to point to the newly allocated device table which the content of old device
table will be copied to. Besides, we also need to:

- Check whether all IOMMUs actually use the same device table with the same size

- Verify that the size of the old device table is the expected size.

- Reserve the old domain id occupied in 1st kernel to avoid touching the old
io-page tables. Then on-flight DMA can continue looking it up.

And also define MACRO DEV_DOMID_MASK to replace magic number 0xffffULL, it can be
reused in copy_dev_tables().

Signed-off-by: Baoquan He <***@redhat.com>
---
v9->v10:
Move the copy code block out of the iommu loop.

Change the return value of copy_device_table() to bool.

drivers/iommu/amd_iommu.c | 2 +-
drivers/iommu/amd_iommu_init.c | 62 +++++++++++++++++++++++++++++++++++++++++
drivers/iommu/amd_iommu_types.h | 1 +
3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 097db07354b4..b22b58b33400 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2086,7 +2086,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
flags |= tmp;
}

- flags &= ~(0xffffUL);
+ flags &= ~DEV_DOMID_MASK;
flags |= domain->id;

amd_iommu_dev_table[devid].data[1] = flags;
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 7044510654fe..e2857204d32a 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -195,6 +195,11 @@ spinlock_t amd_iommu_pd_lock;
* page table root pointer.
*/
struct dev_table_entry *amd_iommu_dev_table;
+/*
+ * Pointer to a device table which the content of old device table
+ * will be copied to. It's only be used in kdump kernel.
+ */
+static struct dev_table_entry *old_dev_tbl_cpy;

/*
* The alias table is a driver specific data structure which contains the
@@ -842,6 +847,63 @@ static int get_dev_entry_bit(u16 devid, u8 bit)
}


+static bool copy_device_table(void)
+{
+ struct dev_table_entry *old_devtb = NULL;
+ u32 lo, hi, devid, old_devtb_size;
+ phys_addr_t old_devtb_phys;
+ u64 entry, last_entry = 0;
+ struct amd_iommu *iommu;
+ u16 dom_id, dte_v;
+ gfp_t gfp_flag;
+
+
+ pr_warn("Translation is already enabled - trying to copy translation structures\n");
+ for_each_iommu(iommu) {
+ /* All IOMMUs should use the same device table with the same size */
+ lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
+ hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
+ entry = (((u64) hi) << 32) + lo;
+ if (last_entry && last_entry != entry) {
+ pr_err("IOMMU:%d should use the same dev table as others!/n",
+ iommu->index);
+ return false;
+ }
+ last_entry = entry;
+
+ old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
+ if (old_devtb_size != dev_table_size) {
+ pr_err("The device table size of IOMMU:%d is not expected!/n",
+ iommu->index);
+ return false;
+ }
+ }
+
+ old_devtb_phys = entry & PAGE_MASK;
+ old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+ if (!old_devtb)
+ return false;
+
+ gfp_flag = GFP_KERNEL | __GFP_ZERO;
+ old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
+ get_order(dev_table_size));
+ if (old_dev_tbl_cpy == NULL) {
+ pr_err("Failed to allocate memory for copying old device table!/n");
+ return false;
+ }
+
+ for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
+ old_dev_tbl_cpy[devid] = old_devtb[devid];
+ dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
+ dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
+ if (dte_v && dom_id)
+ __set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
+ }
+ memunmap(old_devtb);
+
+ return true;
+}
+
void amd_iommu_apply_erratum_63(u16 devid)
{
int sysmgt;
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index f88e802481a3..a7f6cf8c841e 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -336,6 +336,7 @@
#define DTE_FLAG_MASK (0x3ffULL << 32)
#define DTE_GLX_SHIFT (56)
#define DTE_GLX_MASK (3)
+#define DEV_DOMID_MASK 0xffffULL

#define DTE_GCR3_VAL_A(x) (((x) >> 12) & 0x00007ULL)
#define DTE_GCR3_VAL_B(x) (((x) >> 15) & 0x0ffffULL)
--
2.5.5
Baoquan He
2017-08-09 08:40:01 UTC
Permalink
It's ok to disable iommu early in normal kernel or in kdump kernel when
amd_iommu=off is specified. While we should not disable it in kdump kernel
when on-flight dma is still on-going.

Signed-off-by: Baoquan He <***@redhat.com>
---
v9->v10:
Change to call disable_iommus() in normal kernel and the case that
amd_iommu=off is set in kdump kernel. Otherwise if in kdump kernel
but amd_iommu=off is not specified, we can just keep it as it is
in 1st kernel.

drivers/iommu/amd_iommu_init.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 88e7a6e950ae..c7d03251c80a 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2499,7 +2499,8 @@ static int __init early_amd_iommu_init(void)
goto out;

/* Disable any previously enabled IOMMUs */
- disable_iommus();
+ if (!is_kdump_kernel() || amd_iommu_disabled)
+ disable_iommus();

if (amd_iommu_irq_remap)
amd_iommu_irq_remap = check_ioapic_information();
--
2.5.5
Baoquan He
2017-08-09 08:40:01 UTC
Permalink
Move single iommu enabling codes into a wrapper function early_enable_iommu().
This can make later kdump change easier.

And also add iommu_disable_command_buffer and iommu_disable_event_buffer
for later usage.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu_init.c | 42 +++++++++++++++++++++++++++++++-----------
1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 3f72f44fa2df..277838dbc3a6 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -634,6 +634,14 @@ static void iommu_enable_command_buffer(struct amd_iommu *iommu)
amd_iommu_reset_cmd_buffer(iommu);
}

+/*
+ * This function disables the command buffer
+ */
+static void iommu_disable_command_buffer(struct amd_iommu *iommu)
+{
+ iommu_feature_disable(iommu, CONTROL_CMDBUF_EN);
+}
+
static void __init free_command_buffer(struct amd_iommu *iommu)
{
free_pages((unsigned long)iommu->cmd_buf, get_order(CMD_BUFFER_SIZE));
@@ -666,6 +674,14 @@ static void iommu_enable_event_buffer(struct amd_iommu *iommu)
iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN);
}

+/*
+ * This function disables the event log buffer
+ */
+static void iommu_disable_event_buffer(struct amd_iommu *iommu)
+{
+ iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN);
+}
+
static void __init free_event_buffer(struct amd_iommu *iommu)
{
free_pages((unsigned long)iommu->evt_buf, get_order(EVT_BUFFER_SIZE));
@@ -2046,6 +2062,19 @@ static void iommu_enable_ga(struct amd_iommu *iommu)
#endif
}

+static void early_enable_iommu(struct amd_iommu *iommu)
+{
+ iommu_disable(iommu);
+ iommu_init_flags(iommu);
+ iommu_set_device_table(iommu);
+ iommu_enable_command_buffer(iommu);
+ iommu_enable_event_buffer(iommu);
+ iommu_set_exclusion_range(iommu);
+ iommu_enable_ga(iommu);
+ iommu_enable(iommu);
+ iommu_flush_all_caches(iommu);
+}
+
/*
* This function finally enables all IOMMUs found in the system after
* they have been initialized
@@ -2054,17 +2083,8 @@ static void early_enable_iommus(void)
{
struct amd_iommu *iommu;

- for_each_iommu(iommu) {
- iommu_disable(iommu);
- iommu_init_flags(iommu);
- iommu_set_device_table(iommu);
- iommu_enable_command_buffer(iommu);
- iommu_enable_event_buffer(iommu);
- iommu_set_exclusion_range(iommu);
- iommu_enable_ga(iommu);
- iommu_enable(iommu);
- iommu_flush_all_caches(iommu);
- }
+ for_each_iommu(iommu)
+ early_enable_iommu(iommu);

#ifdef CONFIG_IRQ_REMAP
if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
--
2.5.5
Baoquan He
2017-08-09 08:40:02 UTC
Permalink
Firstly split the dev table entry copy into address translation part
and irq remapping part. Because these two parts could be enabled
independently.

Secondly do sanity check for address translation and irq remap of old
dev table entry separately.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu.c | 5 -----
drivers/iommu/amd_iommu_init.c | 23 ++++++++++++++++++++---
drivers/iommu/amd_iommu_types.h | 8 ++++++++
3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index b22b58b33400..dab901b4f0f9 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3776,11 +3776,6 @@ EXPORT_SYMBOL(amd_iommu_device_info);

static struct irq_chip amd_ir_chip;

-#define DTE_IRQ_PHYS_ADDR_MASK (((1ULL << 45)-1) << 6)
-#define DTE_IRQ_REMAP_INTCTL (2ULL << 60)
-#define DTE_IRQ_TABLE_LEN (8ULL << 1)
-#define DTE_IRQ_REMAP_ENABLE 1ULL
-
static void set_dte_irq_entry(u16 devid, struct irq_remap_table *table)
{
u64 dte;
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 959c25d997e1..d08ad74b0928 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -852,12 +852,12 @@ static int get_dev_entry_bit(u16 devid, u8 bit)

static bool copy_device_table(void)
{
+ u64 int_ctl, int_tab_len, entry, last_entry = 0;
struct dev_table_entry *old_devtb = NULL;
u32 lo, hi, devid, old_devtb_size;
phys_addr_t old_devtb_phys;
- u64 entry, last_entry = 0;
struct amd_iommu *iommu;
- u16 dom_id, dte_v;
+ u16 dom_id, dte_v, irq_v;
gfp_t gfp_flag;

if (!amd_iommu_pre_enabled)
@@ -901,8 +901,25 @@ static bool copy_device_table(void)
old_dev_tbl_cpy[devid] = old_devtb[devid];
dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
- if (dte_v && dom_id)
+
+ if (dte_v && dom_id) {
+ old_dev_tbl_cpy[devid].data[0] = old_devtb[devid].data[0];
+ old_dev_tbl_cpy[devid].data[1] = old_devtb[devid].data[1];
__set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
+ }
+
+ irq_v = old_devtb[devid].data[2] & DTE_IRQ_REMAP_ENABLE;
+ int_ctl = old_devtb[devid].data[2] & DTE_IRQ_REMAP_INTCTL_MASK;
+ int_tab_len = old_devtb[devid].data[2] & DTE_IRQ_TABLE_LEN_MASK;
+ if (irq_v && (int_ctl || int_tab_len)) {
+ if ((int_ctl != DTE_IRQ_REMAP_INTCTL) ||
+ (int_tab_len != DTE_IRQ_TABLE_LEN)) {
+ pr_err("Wrong old irq remapping flag: %#x\n", devid);
+ return false;
+ }
+
+ old_dev_tbl_cpy[devid].data[2] = old_devtb[devid].data[2];
+ }
}
memunmap(old_devtb);

diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index a7f6cf8c841e..f0979183ec9b 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -250,6 +250,14 @@

#define GA_GUEST_NR 0x1

+/* Bit value definition for dte irq remapping fields*/
+#define DTE_IRQ_PHYS_ADDR_MASK (((1ULL << 45)-1) << 6)
+#define DTE_IRQ_REMAP_INTCTL_MASK (0x3ULL << 60)
+#define DTE_IRQ_TABLE_LEN_MASK (0xfULL << 1)
+#define DTE_IRQ_REMAP_INTCTL (2ULL << 60)
+#define DTE_IRQ_TABLE_LEN (8ULL << 1)
+#define DTE_IRQ_REMAP_ENABLE 1ULL
+
#define PAGE_MODE_NONE 0x00
#define PAGE_MODE_1_LEVEL 0x01
#define PAGE_MODE_2_LEVEL 0x02
--
2.5.5
Baoquan He
2017-08-09 08:40:02 UTC
Permalink
In AMD-Vi spec several bits of IO PTE fields and DTE fields are similar
so that both of them can share the same MACRO definition. However
defining them respectively can make code more read-able. Do it now.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu.c | 8 ++++----
drivers/iommu/amd_iommu_types.h | 18 ++++++++++++++----
2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 6d2fc40a086d..097db07354b4 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1537,9 +1537,9 @@ static int iommu_map_page(struct protection_domain *dom,

if (count > 1) {
__pte = PAGE_SIZE_PTE(phys_addr, page_size);
- __pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_P | IOMMU_PTE_FC;
+ __pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_PR | IOMMU_PTE_FC;
} else
- __pte = phys_addr | IOMMU_PTE_P | IOMMU_PTE_FC;
+ __pte = phys_addr | IOMMU_PTE_PR | IOMMU_PTE_FC;

if (prot & IOMMU_PROT_IR)
__pte |= IOMMU_PTE_IR;
@@ -2053,7 +2053,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)

pte_root |= (domain->mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
- pte_root |= IOMMU_PTE_IR | IOMMU_PTE_IW | IOMMU_PTE_P | IOMMU_PTE_TV;
+ pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V | DTE_FLAG_TV;

flags = amd_iommu_dev_table[devid].data[1];

@@ -2096,7 +2096,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
static void clear_dte_entry(u16 devid)
{
/* remove entry from the device table seen by the hardware */
- amd_iommu_dev_table[devid].data[0] = IOMMU_PTE_P | IOMMU_PTE_TV;
+ amd_iommu_dev_table[devid].data[0] = DTE_FLAG_V | DTE_FLAG_TV;
amd_iommu_dev_table[devid].data[1] &= DTE_FLAG_MASK;

amd_iommu_apply_erratum_63(devid);
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index db7ceb4d0957..f88e802481a3 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -265,7 +265,7 @@
#define PM_LEVEL_INDEX(x, a) (((a) >> PM_LEVEL_SHIFT((x))) & 0x1ffULL)
#define PM_LEVEL_ENC(x) (((x) << 9) & 0xe00ULL)
#define PM_LEVEL_PDE(x, a) ((a) | PM_LEVEL_ENC((x)) | \
- IOMMU_PTE_P | IOMMU_PTE_IR | IOMMU_PTE_IW)
+ IOMMU_PTE_PR | IOMMU_PTE_IR | IOMMU_PTE_IW)
#define PM_PTE_LEVEL(pte) (((pte) >> 9) & 0x7ULL)

#define PM_MAP_4k 0
@@ -314,13 +314,23 @@
#define PTE_LEVEL_PAGE_SIZE(level) \
(1ULL << (12 + (9 * (level))))

-#define IOMMU_PTE_P (1ULL << 0)
-#define IOMMU_PTE_TV (1ULL << 1)
+/*
+ * Bit value definition for I/O PTE fields
+ */
+#define IOMMU_PTE_PR (1ULL << 0)
#define IOMMU_PTE_U (1ULL << 59)
#define IOMMU_PTE_FC (1ULL << 60)
#define IOMMU_PTE_IR (1ULL << 61)
#define IOMMU_PTE_IW (1ULL << 62)

+/*
+ * Bit value definition for DTE fields
+ */
+#define DTE_FLAG_V (1ULL << 0)
+#define DTE_FLAG_TV (1ULL << 1)
+#define DTE_FLAG_IR (1ULL << 61)
+#define DTE_FLAG_IW (1ULL << 62)
+
#define DTE_FLAG_IOTLB (1ULL << 32)
#define DTE_FLAG_GV (1ULL << 55)
#define DTE_FLAG_MASK (0x3ffULL << 32)
@@ -342,7 +352,7 @@
#define GCR3_VALID 0x01ULL

#define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
-#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_P)
+#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_PR)
#define IOMMU_PTE_PAGE(pte) (phys_to_virt((pte) & IOMMU_PAGE_MASK))
#define IOMMU_PTE_MODE(pte) (((pte) >> 9) & 0x07)
--
2.5.5
Baoquan He
2017-08-09 08:40:02 UTC
Permalink
Implement call-back is_attach_deferred and use it to defer the
domain attach from iommu driver init to device driver init when
iommu is pre-enabled in kdump kernel.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index dab901b4f0f9..eebf4590cef9 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -121,6 +121,7 @@ struct iommu_dev_data {
PPR completions */
u32 errata; /* Bitmap for errata to apply */
bool use_vapic; /* Enable device to use vapic mode */
+ bool defer_attach;

struct ratelimit_state rs; /* Ratelimit IOPF messages */
};
@@ -371,12 +372,17 @@ static u16 get_alias(struct device *dev)
static struct iommu_dev_data *find_dev_data(u16 devid)
{
struct iommu_dev_data *dev_data;
+ struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];

dev_data = search_dev_data(devid);

- if (dev_data == NULL)
+ if (dev_data == NULL) {
dev_data = alloc_dev_data(devid);

+ if (translation_pre_enabled(iommu))
+ dev_data->defer_attach = true;
+ }
+
return dev_data;
}

@@ -2477,11 +2483,18 @@ static struct iommu_group *amd_iommu_device_group(struct device *dev)
static struct protection_domain *get_domain(struct device *dev)
{
struct protection_domain *domain;
+ struct iommu_domain *io_domain;

if (!check_device(dev))
return ERR_PTR(-EINVAL);

domain = get_dev_data(dev)->domain;
+ if (domain == NULL && get_dev_data(dev)->defer_attach) {
+ get_dev_data(dev)->defer_attach = false;
+ io_domain = iommu_get_domain_for_dev(dev);
+ domain = to_pdomain(io_domain);
+ attach_device(dev, domain);
+ }
if (!dma_ops_domain(domain))
return ERR_PTR(-EBUSY);

@@ -3372,6 +3385,13 @@ static void amd_iommu_apply_resv_region(struct device *dev,
WARN_ON_ONCE(reserve_iova(&dma_dom->iovad, start, end) == NULL);
}

+static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain,
+ struct device *dev)
+{
+ struct iommu_dev_data *dev_data = dev->archdata.iommu;
+ return dev_data->defer_attach;
+}
+
const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
@@ -3388,6 +3408,7 @@ const struct iommu_ops amd_iommu_ops = {
.get_resv_regions = amd_iommu_get_resv_regions,
.put_resv_regions = amd_iommu_put_resv_regions,
.apply_resv_region = amd_iommu_apply_resv_region,
+ .is_attach_deferred = amd_iommu_is_attach_deferred,
.pgsize_bitmap = AMD_IOMMU_PGSIZES,
};
--
2.5.5
Baoquan He
2017-08-09 08:40:02 UTC
Permalink
This reverts commit 54bd63570484167cb13edf81e31fff107b879981.

We still need the IO_PAGE_FAULT message to warn error after the
issue of on-flight dma in kdump kernel is fixed.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu.c | 3 +--
drivers/iommu/amd_iommu_init.c | 9 ---------
drivers/iommu/amd_iommu_types.h | 1 -
3 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 354cbd6392cd..6d2fc40a086d 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2086,8 +2086,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
flags |= tmp;
}

-
- flags &= ~(DTE_FLAG_SA | 0xffffULL);
+ flags &= ~(0xffffUL);
flags |= domain->id;

amd_iommu_dev_table[devid].data[1] = flags;
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 277838dbc3a6..7044510654fe 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -29,7 +29,6 @@
#include <linux/export.h>
#include <linux/iommu.h>
#include <linux/kmemleak.h>
-#include <linux/crash_dump.h>
#include <asm/pci-direct.h>
#include <asm/iommu.h>
#include <asm/gart.h>
@@ -1942,14 +1941,6 @@ static void init_device_table_dma(void)
for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
set_dev_entry_bit(devid, DEV_ENTRY_VALID);
set_dev_entry_bit(devid, DEV_ENTRY_TRANSLATION);
- /*
- * In kdump kernels in-flight DMA from the old kernel might
- * cause IO_PAGE_FAULTs. There are no reports that a kdump
- * actually failed because of that, so just disable fault
- * reporting in the hardware to get rid of the messages
- */
- if (is_kdump_kernel())
- set_dev_entry_bit(devid, DEV_ENTRY_NO_PAGE_FAULT);
}
}

diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 0c98b2cf04cc..db7ceb4d0957 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -322,7 +322,6 @@
#define IOMMU_PTE_IW (1ULL << 62)

#define DTE_FLAG_IOTLB (1ULL << 32)
-#define DTE_FLAG_SA (1ULL << 34)
#define DTE_FLAG_GV (1ULL << 55)
#define DTE_FLAG_MASK (0x3ffULL << 32)
#define DTE_GLX_SHIFT (56)
--
2.5.5
Baoquan He
2017-08-09 08:40:02 UTC
Permalink
Add functions to check whether translation is already enabled in IOMMU.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu_init.c | 24 ++++++++++++++++++++++++
drivers/iommu/amd_iommu_proto.h | 1 +
drivers/iommu/amd_iommu_types.h | 3 +++
3 files changed, 28 insertions(+)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 372303700566..3f72f44fa2df 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -258,6 +258,25 @@ static int amd_iommu_enable_interrupts(void);
static int __init iommu_go_to_state(enum iommu_init_state state);
static void init_device_table_dma(void);

+bool translation_pre_enabled(struct amd_iommu *iommu)
+{
+ return (iommu->flags & AMD_IOMMU_FLAG_TRANS_PRE_ENABLED);
+}
+
+static void clear_translation_pre_enabled(struct amd_iommu *iommu)
+{
+ iommu->flags &= ~AMD_IOMMU_FLAG_TRANS_PRE_ENABLED;
+}
+
+static void init_translation_status(struct amd_iommu *iommu)
+{
+ u32 ctrl;
+
+ ctrl = readl(iommu->mmio_base + MMIO_CONTROL_OFFSET);
+ if (ctrl & (1<<CONTROL_IOMMU_EN))
+ iommu->flags |= AMD_IOMMU_FLAG_TRANS_PRE_ENABLED;
+}
+
static inline void update_last_devid(u16 devid)
{
if (devid > amd_iommu_last_bdf)
@@ -1399,6 +1418,11 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)

iommu->int_enabled = false;

+ init_translation_status(iommu);
+
+ if (translation_pre_enabled(iommu))
+ pr_warn("Translation is already enabled - trying to copy translation structures\n");
+
ret = init_iommu_from_acpi(iommu, h);
if (ret)
return ret;
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index 466260f8a1df..a9666d2005bb 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -87,4 +87,5 @@ static inline bool iommu_feature(struct amd_iommu *iommu, u64 f)
return !!(iommu->features & f);
}

+extern bool translation_pre_enabled(struct amd_iommu *iommu);
#endif /* _ASM_X86_AMD_IOMMU_PROTO_H */
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 294a409e283b..0c98b2cf04cc 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -435,6 +435,8 @@ struct iommu_domain;
struct irq_domain;
struct amd_irte_ops;

+#define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED (1 << 0)
+
/*
* This structure contains generic data for IOMMU protection domains
* independent of their use.
@@ -569,6 +571,7 @@ struct amd_iommu {
struct amd_irte_ops *irte_ops;
#endif

+ u32 flags;
volatile u64 __aligned(8) cmd_sem;
};
--
2.5.5
Baoquan He
2017-08-09 08:40:02 UTC
Permalink
AMD pointed out it's unsafe to update the device-table while iommu
is enabled. It turns out that device-table pointer update is split
up into two 32bit writes in the IOMMU hardware. So updating it while
the IOMMU is enabled could have some nasty side effects.

The safe way to work around this is to always allocate the device-table
below 4G, including the old device-table in normal kernel and the
device-table used for copying the content of the old device-table in kdump
kernel. Meanwhile we need check if the address of old device-table is
above 4G because it might has been touched accidentally in corrupted
1st kernel.

Signed-off-by: Baoquan He <***@redhat.com>
---
v9->v10:
The judgement of the address of old_devtb_phys should be '>= 0x100000000ULL'.

drivers/iommu/amd_iommu_init.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index d08ad74b0928..c348732f27d7 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -885,11 +885,15 @@ static bool copy_device_table(void)
}

old_devtb_phys = entry & PAGE_MASK;
+ if (old_devtb_phys >= 0x100000000ULL) {
+ pr_err("The address of old device table is above 4G, not trustworthy!/n");
+ return false;
+ }
old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
if (!old_devtb)
return false;

- gfp_flag = GFP_KERNEL | __GFP_ZERO;
+ gfp_flag = GFP_KERNEL | __GFP_ZERO | GFP_DMA32;
old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
get_order(dev_table_size));
if (old_dev_tbl_cpy == NULL) {
@@ -2432,7 +2436,8 @@ static int __init early_amd_iommu_init(void)

/* Device table - directly used by all IOMMUs */
ret = -ENOMEM;
- amd_iommu_dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+ amd_iommu_dev_table = (void *)__get_free_pages(
+ GFP_KERNEL | __GFP_ZERO | GFP_DMA32,
get_order(dev_table_size));
if (amd_iommu_dev_table == NULL)
goto out;
--
2.5.5
Baoquan He
2017-08-09 08:40:02 UTC
Permalink
When iommu is pre_enabled in kdump kernel, if a device is set up with
guest translations (DTE.GV=1), then don't copy GCR3 table root pointer
but move the device over to an empty guest-cr3 table and handle the
faults in the PPR log (which answer them with INVALID). After all these
PPR faults are recoverable for the device and we should not allow the
device to change old-kernels data when we don't have to.

Signed-off-by: Baoquan He <***@redhat.com>
---
v9->v10:
Clear the DTE_FLAG_GV when handle the GCR3 table root pointer.

drivers/iommu/amd_iommu.c | 28 +++-------------------------
drivers/iommu/amd_iommu_init.c | 12 ++++++++++++
drivers/iommu/amd_iommu_proto.h | 1 +
drivers/iommu/amd_iommu_types.h | 24 ++++++++++++++++++++++++
drivers/iommu/amd_iommu_v2.c | 18 +++++++++++++++++-
5 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index eebf4590cef9..9e8ea1907796 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -103,30 +103,6 @@ int amd_iommu_max_glx_val = -1;
static const struct dma_map_ops amd_iommu_dma_ops;

/*
- * This struct contains device specific data for the IOMMU
- */
-struct iommu_dev_data {
- struct list_head list; /* For domain->dev_list */
- struct list_head dev_data_list; /* For global dev_data_list */
- struct protection_domain *domain; /* Domain the device is bound to */
- u16 devid; /* PCI Device ID */
- u16 alias; /* Alias Device ID */
- bool iommu_v2; /* Device can make use of IOMMUv2 */
- bool passthrough; /* Device is identity mapped */
- struct {
- bool enabled;
- int qdep;
- } ats; /* ATS state */
- bool pri_tlp; /* PASID TLB required for
- PPR completions */
- u32 errata; /* Bitmap for errata to apply */
- bool use_vapic; /* Enable device to use vapic mode */
- bool defer_attach;
-
- struct ratelimit_state rs; /* Ratelimit IOPF messages */
-};
-
-/*
* general struct to manage commands send to an IOMMU
*/
struct iommu_cmd {
@@ -386,10 +362,11 @@ static struct iommu_dev_data *find_dev_data(u16 devid)
return dev_data;
}

-static struct iommu_dev_data *get_dev_data(struct device *dev)
+struct iommu_dev_data *get_dev_data(struct device *dev)
{
return dev->archdata.iommu;
}
+EXPORT_SYMBOL(get_dev_data);

/*
* Find or create an IOMMU group for a acpihid device.
@@ -2540,6 +2517,7 @@ static int dir2prot(enum dma_data_direction direction)
else
return 0;
}
+
/*
* This function contains common code for mapping of a physically
* contiguous memory region into DMA address space. It is used by all
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index c348732f27d7..88e7a6e950ae 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -214,6 +214,7 @@ u16 *amd_iommu_alias_table;
* for a specific device. It is also indexed by the PCI device id.
*/
struct amd_iommu **amd_iommu_rlookup_table;
+EXPORT_SYMBOL(amd_iommu_rlookup_table);

/*
* This table is used to find the irq remapping table for a given device id
@@ -269,6 +270,7 @@ bool translation_pre_enabled(struct amd_iommu *iommu)
{
return (iommu->flags & AMD_IOMMU_FLAG_TRANS_PRE_ENABLED);
}
+EXPORT_SYMBOL(translation_pre_enabled);

static void clear_translation_pre_enabled(struct amd_iommu *iommu)
{
@@ -859,6 +861,7 @@ static bool copy_device_table(void)
struct amd_iommu *iommu;
u16 dom_id, dte_v, irq_v;
gfp_t gfp_flag;
+ u64 tmp;

if (!amd_iommu_pre_enabled)
return false;
@@ -910,6 +913,15 @@ static bool copy_device_table(void)
old_dev_tbl_cpy[devid].data[0] = old_devtb[devid].data[0];
old_dev_tbl_cpy[devid].data[1] = old_devtb[devid].data[1];
__set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
+ /* If gcr3 table existed, mask it out */
+ if (old_devtb[devid].data[0] & DTE_FLAG_GV) {
+ tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
+ tmp |= DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
+ old_dev_tbl_cpy[devid].data[1] &= ~tmp;
+ tmp = DTE_GCR3_VAL_A(~0ULL) << DTE_GCR3_SHIFT_A;
+ tmp |= DTE_FLAG_GV;
+ old_dev_tbl_cpy[devid].data[0] &= ~tmp;
+ }
}

irq_v = old_devtb[devid].data[2] & DTE_IRQ_REMAP_ENABLE;
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index a9666d2005bb..90e62e9b01c5 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -88,4 +88,5 @@ static inline bool iommu_feature(struct amd_iommu *iommu, u64 f)
}

extern bool translation_pre_enabled(struct amd_iommu *iommu);
+extern struct iommu_dev_data *get_dev_data(struct device *dev);
#endif /* _ASM_X86_AMD_IOMMU_PROTO_H */
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index f0979183ec9b..9e5af13be7c5 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -618,6 +618,30 @@ struct devid_map {
bool cmd_line;
};

+/*
+ * This struct contains device specific data for the IOMMU
+ */
+struct iommu_dev_data {
+ struct list_head list; /* For domain->dev_list */
+ struct list_head dev_data_list; /* For global dev_data_list */
+ struct protection_domain *domain; /* Domain the device is bound to */
+ u16 devid; /* PCI Device ID */
+ u16 alias; /* Alias Device ID */
+ bool iommu_v2; /* Device can make use of IOMMUv2 */
+ bool passthrough; /* Device is identity mapped */
+ struct {
+ bool enabled;
+ int qdep;
+ } ats; /* ATS state */
+ bool pri_tlp; /* PASID TLB required for
+ PPR completions */
+ u32 errata; /* Bitmap for errata to apply */
+ bool use_vapic; /* Enable device to use vapic mode */
+ bool defer_attach;
+
+ struct ratelimit_state rs; /* Ratelimit IOPF messages */
+};
+
/* Map HPET and IOAPIC ids to the devid used by the IOMMU */
extern struct list_head ioapic_map;
extern struct list_head hpet_map;
diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 6629c472eafd..e705fac89cb4 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -562,14 +562,30 @@ static int ppr_notifier(struct notifier_block *nb, unsigned long e, void *data)
unsigned long flags;
struct fault *fault;
bool finish;
- u16 tag;
+ u16 tag, devid;
int ret;
+ struct iommu_dev_data *dev_data;
+ struct pci_dev *pdev = NULL;

iommu_fault = data;
tag = iommu_fault->tag & 0x1ff;
finish = (iommu_fault->tag >> 9) & 1;

+ devid = iommu_fault->device_id;
+ pdev = pci_get_bus_and_slot(PCI_BUS_NUM(devid), devid & 0xff);
+ if (!pdev)
+ return -ENODEV;
+ dev_data = get_dev_data(&pdev->dev);
+
+ /* In kdump kernel pci dev is not initialized yet -> send INVALID */
ret = NOTIFY_DONE;
+ if (translation_pre_enabled(amd_iommu_rlookup_table[devid])
+ && dev_data->defer_attach) {
+ amd_iommu_complete_ppr(pdev, iommu_fault->pasid,
+ PPR_INVALID, tag);
+ goto out;
+ }
+
dev_state = get_device_state(iommu_fault->device_id);
if (dev_state == NULL)
goto out;
--
2.5.5
Baoquan He
2017-08-09 08:40:03 UTC
Permalink
This new call-back will be used to check if the domain attach need be
deferred for now. If yes, the domain attach/detach will return directly.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/iommu.c | 8 ++++++++
include/linux/iommu.h | 1 +
2 files changed, 9 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3f6ea160afed..86581b115b92 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1283,6 +1283,10 @@ static int __iommu_attach_device(struct iommu_domain *domain,
struct device *dev)
{
int ret;
+ if ((domain->ops->is_attach_deferred != NULL) &&
+ domain->ops->is_attach_deferred(domain, dev))
+ return 0;
+
if (unlikely(domain->ops->attach_dev == NULL))
return -ENODEV;

@@ -1324,6 +1328,10 @@ EXPORT_SYMBOL_GPL(iommu_attach_device);
static void __iommu_detach_device(struct iommu_domain *domain,
struct device *dev)
{
+ if ((domain->ops->is_attach_deferred != NULL) &&
+ domain->ops->is_attach_deferred(domain, dev))
+ return;
+
if (unlikely(domain->ops->detach_dev == NULL))
return;

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2cb54adc4a33..63983c9e6c3a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -225,6 +225,7 @@ struct iommu_ops {
u32 (*domain_get_windows)(struct iommu_domain *domain);

int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
+ bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev);

unsigned long pgsize_bitmap;
};
--
2.5.5
Baoquan He
2017-08-09 08:40:03 UTC
Permalink
Here several things need be done:
- If iommu is pre-enabled in a normal kernel, just disable it and print
warning.

- If any one of IOMMUs is not pre-enabled in kdump kernel, just continue
as it does in normal kernel.

- If failed to copy dev table of old kernel, continue to proceed as
it does in normal kernel.

- Only if all IOMMUs are pre-enabled and copy dev table is done well, free
the dev table allocated in early_amd_iommu_init() and make amd_iommu_dev_table
point to the copied one.

- Disable and Re-enable event/cmd buffer, install the copied DTE table
to reg, and detect and enable guest vapic.

- Flush all caches

Signed-off-by: Baoquan He <***@redhat.com>
---
v9->v10:
Add global variable amd_iommu_pre_enabled to mark if all IOMMUs are
pre-enabled. If any one is not pre-enabled, just set it as false.

Check amd_iommu_pre_enabled in copy_device_table(), return directly from
copy_device_table() if it's false.

Judge by the return value of copy_device_table() to decide if it will go
through the normal handling or special handling after copy.

drivers/iommu/amd_iommu_init.c | 59 ++++++++++++++++++++++++++++++++++++------
1 file changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index e2857204d32a..959c25d997e1 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -37,6 +37,7 @@
#include <asm/io_apic.h>
#include <asm/irq_remapping.h>

+#include <linux/crash_dump.h>
#include "amd_iommu_proto.h"
#include "amd_iommu_types.h"
#include "irq_remapping.h"
@@ -262,6 +263,8 @@ static int amd_iommu_enable_interrupts(void);
static int __init iommu_go_to_state(enum iommu_init_state state);
static void init_device_table_dma(void);

+static bool __initdata amd_iommu_pre_enabled = true;
+
bool translation_pre_enabled(struct amd_iommu *iommu)
{
return (iommu->flags & AMD_IOMMU_FLAG_TRANS_PRE_ENABLED);
@@ -857,6 +860,8 @@ static bool copy_device_table(void)
u16 dom_id, dte_v;
gfp_t gfp_flag;

+ if (!amd_iommu_pre_enabled)
+ return false;

pr_warn("Translation is already enabled - trying to copy translation structures\n");
for_each_iommu(iommu) {
@@ -1496,9 +1501,14 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
iommu->int_enabled = false;

init_translation_status(iommu);
-
- if (translation_pre_enabled(iommu))
- pr_warn("Translation is already enabled - trying to copy translation structures\n");
+ if (translation_pre_enabled(iommu) && !is_kdump_kernel()) {
+ iommu_disable(iommu);
+ clear_translation_pre_enabled(iommu);
+ pr_warn("Translation was enabled for IOMMU:%d but we are not in kdump mode\n",
+ iommu->index);
+ }
+ if (amd_iommu_pre_enabled)
+ amd_iommu_pre_enabled = translation_pre_enabled(iommu);

ret = init_iommu_from_acpi(iommu, h);
if (ret)
@@ -1993,8 +2003,7 @@ static int __init init_memory_definitions(struct acpi_table_header *table)
}

/*
- * Init the device table to not allow DMA access for devices and
- * suppress all page faults
+ * Init the device table to not allow DMA access for devices
*/
static void init_device_table_dma(void)
{
@@ -2130,14 +2139,48 @@ static void early_enable_iommu(struct amd_iommu *iommu)

/*
* This function finally enables all IOMMUs found in the system after
- * they have been initialized
+ * they have been initialized.
+ *
+ * Or if in kdump kernel and IOMMUs are all pre-enabled, try to copy
+ * the old content of device table entries. Not this case or copy failed,
+ * just continue as normal kernel does.
*/
static void early_enable_iommus(void)
{
struct amd_iommu *iommu;

- for_each_iommu(iommu)
- early_enable_iommu(iommu);
+
+ if (!copy_device_table()) {
+ /*
+ * If come here because of failure in copying device table from old
+ * kernel with all IOMMUs enabled, print error message and try to
+ * free allocated old_dev_tbl_cpy.
+ */
+ if (amd_iommu_pre_enabled)
+ pr_err("Failed to copy DEV table from previous kernel.\n");
+ if (old_dev_tbl_cpy != NULL)
+ free_pages((unsigned long)old_dev_tbl_cpy,
+ get_order(dev_table_size));
+
+ for_each_iommu(iommu) {
+ clear_translation_pre_enabled(iommu);
+ early_enable_iommu(iommu);
+ }
+ } else {
+ pr_info("Copied DEV table from previous kernel.\n");
+ free_pages((unsigned long)amd_iommu_dev_table,
+ get_order(dev_table_size));
+ amd_iommu_dev_table = old_dev_tbl_cpy;
+ for_each_iommu(iommu) {
+ iommu_disable_command_buffer(iommu);
+ iommu_disable_event_buffer(iommu);
+ iommu_enable_command_buffer(iommu);
+ iommu_enable_event_buffer(iommu);
+ iommu_enable_ga(iommu);
+ iommu_set_device_table(iommu);
+ iommu_flush_all_caches(iommu);
+ }
+ }

#ifdef CONFIG_IRQ_REMAP
if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
--
2.5.5
Joerg Roedel
2017-08-15 16:20:03 UTC
Permalink
Post by Baoquan He
iommu/amd: Detect pre enabled translation
iommu/amd: add several helper functions
Revert "iommu/amd: Suppress IO_PAGE_FAULTs in kdump kernel"
iommu/amd: Define bit fields for DTE particularly
iommu/amd: Add function copy_dev_tables()
iommu/amd: copy old trans table from old kernel
iommu/amd: Do sanity check for address translation and irq remap of
old dev table entry
iommu: Add is_attach_deferred call-back to iommu-ops
iommu/amd: Use is_attach_deferred call-back
iommu/amd: Allocate memory below 4G for dev table if translation
pre-enabled
iommu/amd: Don't copy GCR3 table root pointer
iommu/amd: Disable iommu only if amd_iommu=off is specified
Applied all of them, thanks a lot for your continued effort, Baoquan!
Baoquan He
2017-08-16 01:40:01 UTC
Permalink
Post by Joerg Roedel
Post by Baoquan He
iommu/amd: Detect pre enabled translation
iommu/amd: add several helper functions
Revert "iommu/amd: Suppress IO_PAGE_FAULTs in kdump kernel"
iommu/amd: Define bit fields for DTE particularly
iommu/amd: Add function copy_dev_tables()
iommu/amd: copy old trans table from old kernel
iommu/amd: Do sanity check for address translation and irq remap of
old dev table entry
iommu: Add is_attach_deferred call-back to iommu-ops
iommu/amd: Use is_attach_deferred call-back
iommu/amd: Allocate memory below 4G for dev table if translation
pre-enabled
iommu/amd: Don't copy GCR3 table root pointer
iommu/amd: Disable iommu only if amd_iommu=off is specified
Applied all of them, thanks a lot for your continued effort, Baoquan!
Really appreciate your help and great suggestions on design and code
changes since the first post! Without your help I can't make it.

Loading...