git.agner.ch Git - linux-drm-fsl-dcu.git/log

dm array: fix a reference counting bug in shadow_ablock

An old array block could have its reference count decremented below
zero when it is being replaced in the btree by a new array block.

The fix is to increment the old ablock's reference count just before
inserting a new ablock into the btree.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.9+

dm space map: disallow decrementing a reference count below zero

The old behaviour, returning -EINVAL if a ref_count of 0 would be
decremented, was removed in commit f722063 ("dm space map: optimise
sm_ll_dec and sm_ll_inc"). To fix this regression we return an error
code from the mutator function pointer passed to sm_ll_mutate() and have
dec_ref_count() return -EINVAL if the old ref_count is 0.

Add a DMERR to reflect the potential seriousness of this error.

Also, add missing dm_tm_unlock() to sm_ll_mutate()'s error path.

With this fix the following dmts regression test now passes:
dmtest run --suite cache -n /metadata_use_kernel/

The next patch fixes the higher-level dm-array code that exposed this
regression.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.12+

dm stats: initialize read-only module parameter

The module parameter stats_current_allocated_bytes in dm-mod is
read-only. This parameter informs the user about memory
consumption. It is not supposed to be changed by the user.

However, despite being read-only, this parameter can be set on
modprobe or insmod command line:
modprobe dm-mod stats_current_allocated_bytes=12345

The kernel doesn't expect that this variable can be non-zero at module
initialization and if the user sets it, it results in warning.

This patch initializes the variable in the module init routine, so
that user-supplied value is ignored.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.12+

dm bufio: initialize read-only module parameters

Some module parameters in dm-bufio are read-only. These parameters
inform the user about memory consumption. They are not supposed to be
changed by the user.

However, despite being read-only, these parameters can be set on
modprobe or insmod command line, for example:
modprobe dm-bufio current_allocated_bytes=12345

The kernel doesn't expect that these variables can be non-zero at module
initialization and if the user sets them, it results in BUG.

This patch initializes the variables in the module init routine, so that
user-supplied values are ignored.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.2+

dm cache: actually resize cache

Commit f494a9c6b1b6dd9a9f21bbb75d9210d478eeb498 ("dm cache: cache
shrinking support") broke cache resizing support.

dm_cache_resize() is called with cache->cache_size before it gets
updated to new_size, so it is a no-op. But the dm-cache superblock is
updated with the new_size even though the backing dm-array is not
resized. Fix this by passing the new_size to dm_cache_resize().

Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache: update Documentation for invalidate_cblocks's range syntax

The cache target's invalidate_cblocks message allows cache block
(cblock) ranges to be expressed with: <cblock start>-<cblock end>

The range's <cblock end> value is "one past the end", so the range
includes <cblock start> through <cblock end>-1.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>

dm cache policy mq: fix promotions to occur as expected

Micro benchmarks that repeatedly issued IO to a single block were
failing to cause a promotion from the origin device to the cache.  Fix
this by not updating the stats during map() if -EWOULDBLOCK will be
returned.

The mq policy will only update stats, consider migration, etc, once per
tick period (a unit of time established between dm-cache core and the
policies).

When the IO thread calls the policy's map method, if it would like to
migrate the associated block it returns -EWOULDBLOCK, the IO then gets
handed over to a worker thread which handles the migration.  The worker
thread calls map again, to check the migration is still needed (avoids a
race among other things).  *BUT*, before this fix, if we were still in
the same tick period the stats were already updated by the previous map
call -- so the migration would no longer be requested.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm thin: allow pool in read-only mode to transition to read-write mode

A thin-pool may be in read-only mode because the pool's data or metadata
space was exhausted. To allow for recovery, by adding more space to the
pool, we must allow a pool to transition from PM_READ_ONLY to PM_WRITE
mode. Otherwise, running out of space will render the pool permanently
read-only.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

dm thin: re-establish read-only state when switching to fail mode

If the thin-pool transitioned to fail mode and the thin-pool's table
were reloaded for some reason: the new table's default pool mode would
be read-write, though it will transition to fail mode during resume.

When the pool mode transitions directly from PM_WRITE to PM_FAIL we need
to re-establish the intermediate read-only state in both the metadata
and persistent-data block manager (as is usually done with the normal
pool mode transition sequence: PM_WRITE -> PM_READ_ONLY -> PM_FAIL).

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

dm thin: always fallback the pool mode if commit fails

Rename commit_or_fallback() to commit(). Now all previous calls to
commit() will trigger the pool mode to fallback if the commit fails.

Also, check the error returned from commit() in alloc_data_block().

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

dm thin: switch to read-only mode if metadata space is exhausted

Switch the thin pool to read-only mode in alloc_data_block() if
dm_pool_alloc_data_block() fails because the pool's metadata space is
exhausted.

Differentiate between data and metadata space in messages about no
free space available.

This issue was noticed with the device-mapper-test-suite using:
dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/

The quantity of errors logged in this case must be reduced.

before patch:

device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: space map common: dm_tm_shadow_block() failed
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: space map common: dm_tm_shadow_block() failed
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: space map common: dm_tm_shadow_block() failed
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: space map common: dm_tm_shadow_block() failed
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: space map common: dm_tm_shadow_block() failed
<snip ... these repeat for a _very_ long while ... >
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: thin: 253:4: commit failed: error = -28
device-mapper: thin: 253:4: switching pool to read-only mode

after patch:

device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: thin: 253:4: no free metadata space available.
device-mapper: thin: 253:4: switching pool to read-only mode

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org

dm thin: switch to read only mode if a mapping insert fails

Switch the thin pool to read-only mode when dm_thin_insert_block() fails
since there is little reason to expect the cause of the failure to be
resolved without further action by user space.

This issue was noticed with the device-mapper-test-suite using:
dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/

The quantity of errors logged in this case must be reduced.

before patch:

device-mapper: thin: dm_thin_insert_block() failed
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: thin: dm_thin_insert_block() failed
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: thin: dm_thin_insert_block() failed
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: thin: dm_thin_insert_block() failed
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: thin: dm_thin_insert_block() failed
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: thin: dm_thin_insert_block() failed
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: thin: dm_thin_insert_block() failed
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: thin: dm_thin_insert_block() failed
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: thin: dm_thin_insert_block() failed
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: thin: dm_thin_insert_block() failed
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: space map metadata: unable to allocate new metadata block
<snip ... these repeat for a long while ... >
device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: space map common: dm_tm_shadow_block() failed
device-mapper: thin: 253:4: no free metadata space available.
device-mapper: thin: 253:4: switching pool to read-only mode

after patch:

device-mapper: space map metadata: unable to allocate new metadata block
device-mapper: thin: 253:4: dm_thin_insert_block() failed: error = -28
device-mapper: thin: 253:4: switching pool to read-only mode

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

dm space map metadata: return on failure in sm_metadata_new_block

Commit 2fc48021f4afdd109b9e52b6eef5db89ca80bac7 ("dm persistent
metadata: add space map threshold callback") introduced a regression
to the metadata block allocation path that resulted in errors being
ignored.  This regression was uncovered by running the following
device-mapper-test-suite test:
dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/

The ignored error codes in sm_metadata_new_block() could crash the
kernel through use of either the dm-thin or dm-cache targets, e.g.:

device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
device-mapper: space map metadata: unable to allocate new metadata block
general protection fault: 0000 [#1] SMP
...
Workqueue: dm-thin do_worker [dm_thin_pool]
task: ffff880035ce2ab0 ti: ffff88021a054000 task.ti: ffff88021a054000
RIP: 0010:[<ffffffffa0331385>]  [<ffffffffa0331385>] metadata_ll_load_ie+0x15/0x30 [dm_persistent_data]
RSP: 0018:ffff88021a055a68  EFLAGS: 00010202
RAX: 003fc8243d212ba0 RBX: ffff88021a780070 RCX: ffff88021a055a78
RDX: ffff88021a055a78 RSI: 0040402222a92a80 RDI: ffff88021a780070
RBP: ffff88021a055a68 R08: ffff88021a055ba4 R09: 0000000000000010
R10: 0000000000000000 R11: 00000002a02e1000 R12: ffff88021a055ad4
R13: 0000000000000598 R14: ffffffffa0338470 R15: ffff88021a055ba4
FS:  0000000000000000(0000) GS:ffff88033fca0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f467c0291b8 CR3: 0000000001a0b000 CR4: 00000000000007e0
Stack:
ffff88021a055ab8 ffffffffa0332020 ffff88021a055b30 0000000000000001
ffff88021a055b30 0000000000000000 ffff88021a055b18 0000000000000000
ffff88021a055ba4 ffff88021a055b98 ffff88021a055ae8 ffffffffa033304c
Call Trace:
[<ffffffffa0332020>] sm_ll_lookup_bitmap+0x40/0xa0 [dm_persistent_data]
[<ffffffffa033304c>] sm_metadata_count_is_more_than_one+0x8c/0xc0 [dm_persistent_data]
[<ffffffffa0333825>] dm_tm_shadow_block+0x65/0x110 [dm_persistent_data]
[<ffffffffa0331b00>] sm_ll_mutate+0x80/0x300 [dm_persistent_data]
[<ffffffffa0330e60>] ? set_ref_count+0x10/0x10 [dm_persistent_data]
[<ffffffffa0331dba>] sm_ll_inc+0x1a/0x20 [dm_persistent_data]
[<ffffffffa0332270>] sm_disk_new_block+0x60/0x80 [dm_persistent_data]
[<ffffffff81520036>] ? down_write+0x16/0x40
[<ffffffffa001e5c4>] dm_pool_alloc_data_block+0x54/0x80 [dm_thin_pool]
[<ffffffffa001b23c>] alloc_data_block+0x9c/0x130 [dm_thin_pool]
[<ffffffffa001c27e>] provision_block+0x4e/0x180 [dm_thin_pool]
[<ffffffffa001fe9a>] ? dm_thin_find_block+0x6a/0x110 [dm_thin_pool]
[<ffffffffa001c57a>] process_bio+0x1ca/0x1f0 [dm_thin_pool]
[<ffffffff8111e2ed>] ? mempool_free+0x8d/0xa0
[<ffffffffa001d755>] process_deferred_bios+0xc5/0x230 [dm_thin_pool]
[<ffffffffa001d911>] do_worker+0x51/0x60 [dm_thin_pool]
[<ffffffff81067872>] process_one_work+0x182/0x3b0
[<ffffffff81068c90>] worker_thread+0x120/0x3a0
[<ffffffff81068b70>] ? manage_workers+0x160/0x160
[<ffffffff8106eb2e>] kthread+0xce/0xe0
[<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff8152af6c>] ret_from_fork+0x7c/0xb0
[<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff8152af6c>] ret_from_fork+0x7c/0xb0
[<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org # v3.10+

dm table: fail dm_table_create on dm_round_up overflow

The dm_round_up function may overflow to zero. In this case,
dm_table_create() must fail rather than go on to allocate an empty array
with alloc_targets().

This fixes a possible memory corruption that could be caused by passing
too large a number in "param->target_count".

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

dm snapshot: avoid snapshot space leak on crash

There is a possible leak of snapshot space in case of crash.

The reason for space leaking is that chunks in the snapshot device are
allocated sequentially, but they are finished (and stored in the metadata)
out of order, depending on the order in which copying finished.

For example, supposed that the metadata contains the following records
SUPERBLOCK
METADATA (blocks 0 ... 250)
DATA 0
DATA 1
DATA 2
...
DATA 250

Now suppose that you allocate 10 new data blocks 251-260. Suppose that
copying of these blocks finish out of order (block 260 finished first
and the block 251 finished last). Now, the snapshot device looks like
this:
SUPERBLOCK
METADATA (blocks 0 ... 250, 260, 259, 258, 257, 256)
DATA 0
DATA 1
DATA 2
...
DATA 250
DATA 251
DATA 252
DATA 253
DATA 254
DATA 255
METADATA (blocks 255, 254, 253, 252, 251)
DATA 256
DATA 257
DATA 258
DATA 259
DATA 260

Now, if the machine crashes after writing the first metadata block but
before writing the second metadata block, the space for areas DATA 250-255
is leaked, it contains no valid data and it will never be used in the
future.

This patch makes dm-snapshot complete exceptions in the same order they
were allocated, thus fixing this bug.

Note: when backporting this patch to the stable kernel, change the version
field in the following way:
* if version in the stable kernel is {1, 11, 1}, change it to {1, 12, 0}
* if version in the stable kernel is {1, 10, 0} or {1, 10, 1}, change it
to {1, 10, 2}
Userspace reads the version to determine if the bug was fixed, so the
version change is needed.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

dm delay: fix a possible deadlock due to shared workqueue

The dm-delay target uses a shared workqueue for multiple instances. This
can cause deadlock if two or more dm-delay targets are stacked on the top
of each other.

This patch changes dm-delay to use a per-instance workqueue.

Cc: stable@vger.kernel.org # 2.6.22+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache: resolve small nits and improve Documentation

Document passthrough mode, cache shrinking, and cache invalidation.
Also, use strcasecmp() and hlist_unhashed().

Reported-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache: add cache block invalidation support

Cache block invalidation is removing an entry from the cache without
writing it back.  Cache blocks can be invalidated via the
'invalidate_cblocks' message, which takes an arbitrary number of cblock
ranges:
   invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*

E.g.
   dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache: add remove_cblock method to policy interface

Implement policy_remove_cblock() and add remove_cblock method to the mq
policy. These methods will be used by the following cache block
invalidation patch which adds the 'invalidate_cblocks' message to the
cache core.

Also, update some comments in dm-cache-policy.h

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache policy mq: reduce memory requirements

Rather than storing the cblock in each cache entry, we allocate all
entries in an array and infer the cblock from the entry position.

Saves 4 bytes of memory per cache block. In addition, this gives us an
easy way of looking up cache entries by cblock.

We no longer need to keep an explicit bitset to track which cblocks
have been allocated. And no searching is needed to find free cblocks.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache metadata: check the metadata version when reading the superblock

Need to check the version to verify on-disk metadata is supported.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache: add passthrough mode

"Passthrough" is a dm-cache operating mode (like writethrough or
writeback) which is intended to be used when the cache contents are not
known to be coherent with the origin device.  It behaves as follows:

* All reads are served from the origin device (all reads miss the cache)
* All writes are forwarded to the origin device; additionally, write
  hits cause cache block invalidates

This mode decouples cache coherency checks from cache device creation,
largely to avoid having to perform coherency checks while booting.  Boot
scripts can create cache devices in passthrough mode and put them into
service (mount cached filesystems, for example) without having to worry
about coherency.  Coherency that exists is maintained, although the
cache will gradually cool as writes take place.

Later, applications can perform coherency checks, the nature of which
will depend on the type of the underlying storage.  If coherency can be
verified, the cache device can be transitioned to writethrough or
writeback mode while still warm; otherwise, the cache contents can be
discarded prior to transitioning to the desired operating mode.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Morgan Mears <Morgan.Mears@netapp.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache: cache shrinking support

Allow a cache to shrink if the blocks being removed from the cache are
not dirty.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache: promotion optimisation for writes

If a write block triggers promotion and covers a whole block we can
avoid a copy.

Introduce dm_{hook,unhook}_bio to simplify saving and restoring bio
fields (bi_private is now used by overwrite). Switch writethrough
support over to using these helpers too.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache: be much more aggressive about promoting writes to discarded blocks

Previously these promotions only got priority if there were unused cache
blocks. Now we give them priority if there are any clean blocks in the
cache.

The fio_soak_test in the device-mapper-test-suite now gives uniform
performance across subvolumes (~16 seconds).

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache policy mq: implement writeback_work() and mq_{set,clear}_dirty()

There are now two multiqueues for in cache blocks. A clean one and a
dirty one.

writeback_work comes from the dirty one. Demotions come from the clean
one.

There are two benefits:
- Performance improvement, since demoting a clean block is a noop.
- The cache cleans itself when io load is light.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache: optimize commit_if_needed

Check commit_requested flag _before_ calling
dm_cache_changed_this_transaction() superfluously.

Also, be sure to set last_commit_jiffies _after_ dm_cache_commit()
completes.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm space map disk: optimise sm_disk_dec_block

Don't waste time spotting blocks that have been allocated and then freed
in the same transaction.

The extra lookup is expensive, and I don't think it really gives us much.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

MAINTAINERS: add reference to device-mapper's linux-dm.git tree

Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm: fix Kconfig menu indentation

The option DM_LOG_USERSPACE is sub-option of DM_MIRROR, so place it
right after DM_MIRROR. Doing so fixes various other Device mapper
targets/features to be properly nested under "Device mapper support".

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm: allow remove to be deferred

This patch allows the removal of an open device to be deferred until
it is closed. (Previously such a removal attempt would fail.)

The deferred remove functionality is enabled by setting the flag
DM_DEFERRED_REMOVE in the ioctl structure on DM_DEV_REMOVE or
DM_REMOVE_ALL ioctl.

On return from DM_DEV_REMOVE, the flag DM_DEFERRED_REMOVE indicates if
the device was removed immediately or flagged to be removed on close -
if the flag is clear, the device was removed.

On return from DM_DEV_STATUS and other ioctls, the flag
DM_DEFERRED_REMOVE is set if the device is scheduled to be removed on
closure.

A device that is scheduled to be deleted can be revived using the
message "@cancel_deferred_remove". This message clears the
DMF_DEFERRED_REMOVE flag so that the device won't be deleted on close.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm table: print error on preresume failure

If preresume fails it is worth logging an error given that a device is
left suspended due to the failure.

This change was motivated by local preresume error logging that was
added to the cache target ("preresume failed"). Elevating this
target-agnostic context for the where the target-specific error occurred
relative to the DM core's callouts makes sense.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>

dm crypt: add TCW IV mode for old CBC TCRYPT containers

dm-crypt can already activate TCRYPT (TrueCrypt compatible) containers
in LRW or XTS block encryption mode.

TCRYPT containers prior to version 4.1 use CBC mode with some additional
tweaks, this patch adds support for these containers.

This new mode is implemented using special IV generator named TCW
(TrueCrypt IV with whitening).  TCW IV only supports containers that are
encrypted with one cipher (Tested with AES, Twofish, Serpent, CAST5 and
TripleDES).

While this mode is legacy and is known to be vulnerable to some
watermarking attacks (e.g. revealing of hidden disk existence) it can
still be useful to activate old containers without using 3rd party
software or for independent forensic analysis of such containers.

(Both the userspace and kernel code is an independent implementation
based on the format documentation and it completely avoids use of
original source code.)

The TCW IV generator uses two additional keys: Kw (whitening seed, size
is always 16 bytes - TCW_WHITENING_SIZE) and Kiv (IV seed, size is
always the IV size of the selected cipher).  These keys are concatenated
at the end of the main encryption key provided in mapping table.

While whitening is completely independent from IV, it is implemented
inside IV generator for simplification.

The whitening value is always 16 bytes long and is calculated per sector
from provided Kw as initial seed, xored with sector number and mixed
with CRC32 algorithm.  Resulting value is xored with ciphertext sector
content.

IV is calculated from the provided Kiv as initial IV seed and xored with
sector number.

Detailed calculation can be found in the Truecrypt documentation for
version < 4.1 and will also be described on dm-crypt site, see:
http://code.google.com/p/cryptsetup/wiki/DMCrypt

The experimental support for activation of these containers is already
present in git devel brach of cryptsetup.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm crypt: properly handle extra key string in initialization

Some encryption modes use extra keys (e.g. loopAES has IV seed) which
are not used in block cipher initialization but are part of key string
in table constructor.

This patch adds an additional field which describes the length of the
extra key(s) and substracts it before real key encryption setting.

The key_size always includes the size, in bytes, of the key provided
in mapping table.

The key_parts describes how many parts (usually keys) are contained in
the whole key buffer.  And key_extra_size contains size in bytes of
additional keys part (this number of bytes must be subtracted because it
is processed by the IV generator).

| K1 | K2 | .... | K64 |      Kiv       |
|----------- key_size ----------------- |
|                      |-key_extra_size-|
|     [64 keys]        |  [1 key]       | => key_parts = 65

Example where key string contains main key K, whitening key
Kw and IV seed Kiv:

|     K       |   Kiv   |       Kw      |
|--------------- key_size --------------|
|             |-----key_extra_size------|
|  [1 key]    | [1 key] |     [1 key]   | => key_parts = 3

Because key_extra_size is calculated during IV mode setting, key
initialization is moved after this step.

For now, this change has no effect to supported modes (thanks to ilog2
rounding) but it is required by the following patch.

Also, fix a sparse warning in crypt_iv_lmk_one().

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache: log error message if dm_kcopyd_copy() fails

A migration failure should be logged (albeit limited).

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache: use cell_defer() boolean argument consistently

Fix a few cell_defer() calls that weren't passing a bool.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache: return -EINVAL if the user specifies unknown cache policy

Return -EINVAL when the specified cache policy is unknown rather than
returning -ENOMEM.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache metadata: return bool from __superblock_all_zeroes

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache policy mq: a few small fixes

Rename takeout_queue to concat_queue.

Fix a harmless bug in mq policies pop() function. Currently pop()
always succeeds, with up coming changes this wont be the case.

Fix typo in comment above pre_cache_to_cache prototype.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache policy: remove return from void policy_remove_mapping

No need to return from a void function.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache: improve efficiency of quiescing flag management

Make the quiescing flag an atomic_t and stop protecting it with a spin
lock.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache: fix a race condition between queuing new migrations and quiescing for a shutdown

The code that was trying to do this was inadequate.  The postsuspend
method (in ioctl context), needs to wait for the worker thread to
acknowledge the request to quiesce.  Otherwise the migration count may
drop to zero temporarily before the worker thread realises we're
quiescing.  In this case the target will be taken down, but the worker
thread may have issued a new migration, which will cause an oops when
it completes.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.9+

dm cache: io destined for the cache device can now serve as tick bios

Previously only origin bios could trigger ticks, which meant if all
the io was destined for the cache no ticks were generated. If no ticks
are generated then multiple hits, and movements in general, are
attributed to the same tick.

Only a stop gap fix, we need a better solution.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm cache policy mq: protect residency method with existing mutex

It is safe to use a mutex in mq_residency() at this point since it is
only called from ioctl context. But future-proof mq_residency() by
using might_sleep() to catch new contexts that cannot sleep.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm array: fix bug in growing array

Entries would be lost if the old tail block was partially filled.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.9+

dm mpath: requeue I/O during pg_init

When pg_init is running no I/O can be submitted to the underlying
devices, as the path priority etc might change.  When using queue_io for
this, requests will be piling up within multipath as the block I/O
scheduler just sees a _very fast_ device.  All of this queued I/O has to
be resubmitted from within multipathing once pg_init is done.

This approach has the problem that it's virtually impossible to
abort I/O when pg_init is running, and we're adding heavy load
to the devices after pg_init since all of the queued I/O needs to be
resubmitted _before_ any requests can be pulled off of the request queue
and normal operation continues.

This patch will requeue the I/O that triggers the pg_init call, and
return 'busy' when pg_init is in progress.  With these changes the block
I/O scheduler will stop submitting I/O during pg_init, resulting in a
quicker path switch and less I/O pressure (and memory consumption) after
pg_init.

Signed-off-by: Hannes Reinecke <hare@suse.de>
[patch header edited for clarity and typos by Mike Snitzer]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

dm mpath: fix race condition between multipath_dtr and pg_init_done

Whenever multipath_dtr() is happening we must prevent queueing any
further path activation work.  Implement this by adding a new
'pg_init_disabled' flag to the multipath structure that denotes future
path activation work should be skipped if it is set.  By disabling
pg_init and then re-enabling in flush_multipath_work() we also avoid the
potential for pg_init to be initiated while suspending an mpath device.

Without this patch a race condition exists that may result in a kernel
panic:

1) If after pg_init_done() decrements pg_init_in_progress to 0, a call
   to wait_for_pg_init_completion() assumes there are no more pending path
   management commands.
2) If pg_init_required is set by pg_init_done(), due to retryable
   mode_select errors, then process_queued_ios() will again queue the
   path activation work.
3) If free_multipath() completes before activate_path() work is called a
   NULL pointer dereference like the following can be seen when
   accessing members of the recently destructed multipath:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
RIP: 0010:[<ffffffffa003db1b>]  [<ffffffffa003db1b>] activate_path+0x1b/0x30 [dm_multipath]
[<ffffffff81090ac0>] worker_thread+0x170/0x2a0
[<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40

[switch to disabling pg_init in flush_multipath_work & header edits by Mike Snitzer]
Signed-off-by: Shiva Krishna Merla <shivakrishna.merla@netapp.com>
Reviewed-by: Krishnasamy Somasundaram <somasundaram.krishnasamy@netapp.com>
Tested-by: Speagle Andy <Andy.Speagle@netapp.com>
Acked-by: Junichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

dm: allocate buffer for messages with small number of arguments using GFP_NOIO

dm-mpath and dm-thin must process messages even if some device is
suspended, so we allocate argv buffer with GFP_NOIO. These messages have
a small fixed number of arguments.

On the other hand, dm-switch needs to process bulk data using messages
so excessive use of GFP_NOIO could cause trouble.

The patch also lowers the default number of arguments from 64 to 8, so
that there is smaller load on GFP_NOIO allocations.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

Linux 3.12-rc5

Merge git://www.linux-watchdog.org/linux-watchdog

Pull watchdog fixes from Wim Van Sebroeck:
"This will fix a deadlock on the ts72xx_wdt driver, fix bitmasks in the
  kempld_wdt driver and fix a section mismatch in the sunxi_wdt driver"

* git://www.linux-watchdog.org/linux-watchdog:
  watchdog: sunxi: Fix section mismatch
  watchdog: kempld_wdt: Fix bit mask definition
  watchdog: ts72xx_wdt: locking bug in ioctl

watchdog: sunxi: Fix section mismatch

This driver has a section mismatch, for probe and remove functions,
leading to the following warning during the compilation.

WARNING: drivers/watchdog/built-in.o(.data+0x24): Section mismatch in
reference from the variable sunxi_wdt_driver to the function
.init.text:sunxi_wdt_probe()
The variable sunxi_wdt_driver references
the function __init sunxi_wdt_probe()

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>

watchdog: kempld_wdt: Fix bit mask definition

STAGE_CFG bits are defined as [5:4] bits. However, '(((x) & 0x30) << 4)'
handles [9:8] bits. Thus, it should be fixed in order to handle
[5:4] bits.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>

watchdog: ts72xx_wdt: locking bug in ioctl

Calling the WDIOC_GETSTATUS & WDIOC_GETBOOTSTATUS and twice will cause a
interruptible deadlock.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>

Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"A small batch of fixes this week, mostly OMAP related.  Nothing stands
  out as particularly controversial.

  Also a fix for a 3.12-rc1 timer regression for Exynos platforms,
  including the Chromebooks"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: exynos: dts: Update 5250 arch timer node with clock frequency
  ARM: OMAP2: RX-51: Add missing max_current to rx51_lp5523_led_config
  ARM: mach-omap2: board-generic: fix undefined symbol
  ARM: dts: Fix pinctrl mask for omap3
  ARM: OMAP3: Fix hardware detection for omap3630 when booted with device tree
  ARM: OMAP2: gpmc-onenand: fix sync mode setup with DT

ARM: exynos: dts: Update 5250 arch timer node with clock frequency

Without the "clock-frequency" property in arch timer node, could able
to see the below crash dump.

[<c0014e28>] (unwind_backtrace+0x0/0xf4) from [<c0011808>] (show_stack+0x10/0x14)
[<c0011808>] (show_stack+0x10/0x14) from [<c036ac1c>] (dump_stack+0x7c/0xb0)
[<c036ac1c>] (dump_stack+0x7c/0xb0) from [<c01ab760>] (Ldiv0_64+0x8/0x18)
[<c01ab760>] (Ldiv0_64+0x8/0x18) from [<c0062f60>] (clockevents_config.part.2+0x1c/0x74)
[<c0062f60>] (clockevents_config.part.2+0x1c/0x74) from [<c0062fd8>] (clockevents_config_and_register+0x20/0x2c)
[<c0062fd8>] (clockevents_config_and_register+0x20/0x2c) from [<c02b8e8c>] (arch_timer_setup+0xa8/0x134)
[<c02b8e8c>] (arch_timer_setup+0xa8/0x134) from [<c04b47b4>] (arch_timer_init+0x1f4/0x24c)
[<c04b47b4>] (arch_timer_init+0x1f4/0x24c) from [<c04b40d8>] (clocksource_of_init+0x34/0x58)
[<c04b40d8>] (clocksource_of_init+0x34/0x58) from [<c049ed8c>] (time_init+0x20/0x2c)
[<c049ed8c>] (time_init+0x20/0x2c) from [<c049b95c>] (start_kernel+0x1e0/0x39c)

THis is because the Exynos u-boot, for example on the Chromebooks, doesn't set
up the CNTFRQ register as expected by arch_timer. Instead, we have to specify
the frequency in the device tree like this.

Signed-off-by: Yuvaraj Kumar C D <yuvaraj.cd@samsung.com>
[olof: Changed subject, added comment, elaborated on commit message]
Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'fixes-against-v3.12-rc3-take2' of git://git./linux/kernel/git/tmlind/linux-omap into fixes

From Tony Lindgren:

Few fixes for omap3 related hangs and errors that people have
noticed now that people are actually using the device tree
based booting for omap3.

Also one regression fix for timer compile for dra7xx when
omap5 is not selected, and a LED regression fix for n900.

* tag 'fixes-against-v3.12-rc3-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: OMAP2: RX-51: Add missing max_current to rx51_lp5523_led_config
  ARM: mach-omap2: board-generic: fix undefined symbol
  ARM: dts: Fix pinctrl mask for omap3
  ARM: OMAP3: Fix hardware detection for omap3630 when booted with device tree
  ARM: OMAP2: gpmc-onenand: fix sync mode setup with DT

Signed-off-by: Olof Johansson <olof@lixom.net>

Merge branch 'parisc-3.12' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
"This patchset includes a bugfix to prevent a kernel crash when memory
  in page zero is accessed by the kernel itself, e.g.  via
  probe_kernel_read().

  Furthermore we now export flush_cache_page() which is needed
  (indirectly) by the lustre filesystem.  The other patches remove
  unused functions and optimizes the page fault handler to only evaluate
  variables if needed, which again protects against possible kernel
  crashes"

* 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: let probe_kernel_read() capture access to page zero
  parisc: optimize variable initialization in do_page_fault
  parisc: fix interruption handler to respect pagefault_disable()
  parisc: mark parisc_terminate() noreturn and cold.
  parisc: remove unused syscall_ipi() function.
  parisc: kill SMP single function call interrupt
  parisc: Export flush_cache_page() (needed by lustre)

Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma

Pull slave-dmaengine fixes from Vinod Koul:
"Another week, time to send another fixes request taking time out of
  extended weekend for the festivities in this part of the world.

  We have two fixes from Sergei for rcar driver and one fixing memory
  leak of edma driver by Geyslan"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dma: edma.c: remove edma_desc leakage
  rcar-hpbdma: add parameter to set_slave() method
  rcar-hpbdma: remove shdma_free_irq() calls

parisc: let probe_kernel_read() capture access to page zero

Signed-off-by: Helge Deller <deller@gmx.de>

parisc: optimize variable initialization in do_page_fault

The attached change defers the initialization of the variables tsk, mm
and flags until they are needed. As a result, the code won't crash if a
kernel probe is done with a corrupt context and the code will be better
optimized.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>

parisc: fix interruption handler to respect pagefault_disable()

Running an "echo t > /proc/sysrq-trigger" crashes the parisc kernel.  The
problem is, that in print_worker_info() we try to read the workqueue info via
the probe_kernel_read() functions which use pagefault_disable() to avoid
crashes like this:
    probe_kernel_read(&pwq, &worker->current_pwq, sizeof(pwq));
    probe_kernel_read(&wq, &pwq->wq, sizeof(wq));
    probe_kernel_read(name, wq->name, sizeof(name) - 1);

The problem here is, that the first probe_kernel_read(&pwq) might return zero
in pwq and as such the following probe_kernel_reads() try to access contents of
the page zero which is read protected and generate a kernel segfault.

With this patch we fix the interruption handler to call parisc_terminate()
directly only if pagefault_disable() was not called (in which case
preempt_count()==0).  Otherwise we hand over to the pagefault handler which
will try to look up the faulting address in the fixup tables.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v3.0+
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>

parisc: mark parisc_terminate() noreturn and cold.

Signed-off-by: Helge Deller <deller@gmx.de>

parisc: remove unused syscall_ipi() function.

Signed-off-by: Helge Deller <deller@gmx.de>

parisc: kill SMP single function call interrupt

Commit 9a46ad6d6df3b54 "smp: make smp_call_function_many() use logic
similar to smp_call_function_single()" has unified the way to handle
single and multiple cross-CPU function calls. Now only one interrupt
is needed for architecture specific code to support generic SMP function
call interfaces, so kill the redundant single function call interrupt.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>

parisc: Export flush_cache_page() (needed by lustre)

ERROR: "flush_cache_page" [drivers/staging/lustre/lustre/libcfs/libcfs.ko] undefined!

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Helge Deller <deller@gmx.de>

vfs: allow O_PATH file descriptors for fstatfs()

Olga reported that file descriptors opened with O_PATH do not work with
fstatfs(), found during further development of ksh93's thread support.

There is no reason to not allow O_PATH file descriptors here (fstatfs is
very much a path operation), so use "fdget_raw()". See commit
55815f70147d ("vfs: make O_PATH file descriptors usable for 'fstat()'")
for a very similar issue reported for fstat() by the same team.

Reported-and-tested-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org # O_PATH introduced in 3.0+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o:
"A bug fix and performance regression fix for ext4"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix memory leak in xattr
ext4: fix performance regression in writeback of random writes

Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"We've got more bug fixes in my for-linus branch:

  One of these fixes another corner of the compression oops from last
  time.  Miao nailed down some problems with concurrent snapshot
  deletion and drive balancing.

  I kept out one of his patches for more testing, but these are all
  stable"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix oops caused by the space balance and dead roots
  Btrfs: insert orphan roots into fs radix tree
  Btrfs: limit delalloc pages outside of find_delalloc_range
  Btrfs: use right root when checking for hash collision

Merge tag 'sound-3.12' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"All stable fixes except for a trivial headset mic fixup: the removal
  of bogus frame checks in snd-usb-usx2y driver that have regressed in
  the recent kernel versions, the HD-audio HDMI channel map fix, and a
  few HD-audio device-specific fixes"

* tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Sony VAIO Pro 13 (haswell) now has a working headset jack
  ALSA: hda - Add a headset mic model for ALC269 and friends
  ALSA: hda - Fix microphone for Sony VAIO Pro 13 (Haswell model)
  ALSA: hda - Add fixup for ASUS N56VZ
  ALSA: hda - hdmi: Fix channel map switch not taking effect
  ALSA: hda - Fix mono speakers and headset mic on Dell Vostro 5470
  ALSA: snd-usb-usx2y: remove bogus frame checks

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"We had various reports of problems with deferred probing in the I2C
  subsystem, so this pull requst is a little bigger than usual.

  Most issues should be addressed now so devices will be found
  correctly.  A few ususal driver bugfixes are in here, too"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: i2c-mux-pinctrl: use deferred probe when adapter not found
  i2c: i2c-arb-gpio-challenge: use deferred probe when adapter not found
  i2c: i2c-mux-gpio: use deferred probing
  i2c: i2c-mux-gpio: don't ignore of_get_named_gpio errors
  i2c: omap: Clear ARDY bit twice
  i2c: Not all adapters have a parent
  i2c: i2c-stu300: replace platform_driver_probe to support deferred probing
  i2c: i2c-mxs: replace platform_driver_probe to support deferred probing
  i2c: i2c-imx: replace platform_driver_probe to support deferred probing
  i2c: i2c-designware-platdrv: replace platform_driver_probe to support deferred probing

ext4: fix memory leak in xattr

If we take the 2nd retry path in ext4_expand_extra_isize_ea, we
potentionally return from the function without having freed these
allocations.  If we don't do the return, we over-write the previous
allocation pointers, so we leak either way.

Spotted with Coverity.

[ Fixed by tytso to set is and bs to NULL after freeing these
  pointers, in case in the retry loop we later end up triggering an
  error causing a jump to cleanup, at which point we could have a double
  free bug. -- Ted ]

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable@vger.kernel.org

Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull gcc "asm goto" miscompilation workaround from Ingo Molnar:
"This is the fix for the GCC miscompilation discussed in the following
  lkml thread:

    [x86] BUG: unable to handle kernel paging request at 00740060

  The bug in GCC has been fixed by Jakub and the fix will be part of the
  GCC 4.8.2 release expected to be released next week - so the quirk's
  version test checks for <= 4.8.1.

  The quirk is only added to compiler-gcc4.h and not to the higher level
  compiler.h because all asm goto uses are behind a feature check"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  compiler/gcc4: Add quirk for 'asm goto' miscompilation bug

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"A build fix and a reboot quirk"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/reboot: Add reboot quirk for Dell Latitude E5410
x86, build, pci: Fix PCI_MSI build on !SMP

Merge tag 'arc-fixes-for-3.12-part3' of git://git./linux/kernel/git/vgupta/arc

Pull ARC fix from Vineet Gupta:
"Fix for broken gdb 'jump'"

* tag 'arc-fixes-for-3.12-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: Ignore ptrace SETREGSET request for synthetic register "stop_pc"

ARC: Ignore ptrace SETREGSET request for synthetic register "stop_pc"

ARCompact TRAP_S insn used for breakpoints, commits before exception is
taken (updating architectural PC). So ptregs->ret contains next-PC and
not the breakpoint PC itself. This is different from other restartable
exceptions such as TLB Miss where ptregs->ret has exact faulting PC.
gdb needs to know exact-PC hence ARC ptrace GETREGSET provides for
@stop_pc which returns ptregs->ret vs. EFA depending on the
situation.

However, writing stop_pc (SETREGSET request), which updates ptregs->ret
doesn't makes sense stop_pc doesn't always correspond to that reg as
described above.

This was not an issue so far since user_regs->ret / user_regs->stop_pc
had same value and both writing to ptregs->ret was OK, needless, but NOT
broken, hence not observed.

With gdb "jump", they diverge, and user_regs->ret updating ptregs is
overwritten immediately with stop_pc, which this patch fixes.

Reported-by: Anton Kolesov <akolesov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS fix from Ralf Baechle:
"Just one fix.  The stack protector was loading the value of the canary
  instead of its address"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: stack protector: Fix per-task canary switch

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"All over the map..

   - nouveau:
     disable MSI, needs more work, will try again next merge window
   - radeon:
      audio + uvd regression fixes, dpm fixes, reset fixes
   - i915:
     the dpms fix might fix your haswell

  And one pain in the ass revert, so we have VGA arbitration that when
  implemented 4-5 years ago really hoped that GPUs could remove
  themselves from arbitration completely once they had a kernel driver.

  It seems Intel hw designers decided that was too nice a facility to
  allow us to have so they removed it when they went on-die (so since
  Ironlake at least).  Now Alex Williamson added support for VGA
  arbitration for newer GPUs however this now exposes itself to
  userspace as requireing arbitration of GPU VGA regions and the X
  server gets involved and disables things that it can't handle when VGA
  access is possibly required around every operation.

  So in order to not break userspace we just reverted things back to the
  old known broken status so maybe we can try and design out way out.

  Ville also had a patch to use stop machine for the two times Intel
  needs to access VGA space, that might be acceptable with some rework,
  but for now myself and Daniel agreed to just go back"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (23 commits)
  Revert "i915: Update VGA arbiter support for newer devices"
  Revert "drm/i915: Delay disabling of VGA memory until vgacon->fbcon handoff is done"
  drm/radeon: re-enable sw ACR support on pre-DCE4
  drm/radeon/dpm: disable bapm on TN asics
  drm/radeon: improve soft reset on CIK
  drm/radeon: improve soft reset on SI
  drm/radeon/dpm: off by one in si_set_mc_special_registers()
  drm/radeon/dpm/btc: off by one in btc_set_mc_special_registers()
  drm/radeon: forever loop on error in radeon_do_test_moves()
  drm/radeon: fix hw contexts for SUMO2 asics
  drm/radeon: fix typo in CP DMA register headers
  drm/radeon/dpm: disable multiple UVD states
  drm/radeon: use hw generated CTS/N values for audio
  drm/radeon: fix N/CTS clock matching for audio
  drm/radeon: use 64-bit math to calculate CTS values for audio (v2)
  drm/edid: catch kmalloc failure in drm_edid_to_speaker_allocation
  Revert "drm/fb-helper: don't sleep for screen unblank when an oops is in progress"
  drm/gma500: fix things after get/put page helpers
  drm/nouveau/mc: disable msi support by default, it's busted in tons of places
  drm/i915: Only apply DPMS to the encoder if enabled
  ...

ALSA: hda - Sony VAIO Pro 13 (haswell) now has a working headset jack

Just got the positive confirmation from a tester:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1227093/comments/28

Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda - Add a headset mic model for ALC269 and friends

Using the headset mic model will cause the headset mic to be labeled
"headset mic" instead of just "mic".

Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda - Fix microphone for Sony VAIO Pro 13 (Haswell model)

The external mic showed up with a precense detect of "always present",
essentially disabling the internal mic. Therefore turn off presence
detection for this pin.

Note: The external mic seems not yet working, but an internal mic is
certainly better than no mic at all.

Cc: stable@vger.kernel.org
BugLink: https://bugs.launchpad.net/bugs/1227093
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

compiler/gcc4: Add quirk for 'asm goto' miscompilation bug

Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down
a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto'
constructs, as outlined here:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670

Implement a workaround suggested by Jakub Jelinek.

Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Suggested-by: Jakub Jelinek <jakub@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Revert "i915: Update VGA arbiter support for newer devices"

This reverts commit 81b5c7bc8de3e6f63419139c2fc91bf81dea8a7d.

Adding drm/i915 into the vga arbiter chain means that X (in a piece of
well-meant paranoia) will do a get/put on the vga decoding around
_every_ accel call down into the ddx. Which results in some nice
performance disasters [1]. This really breaks userspace, by disabling
DRI for everyone, and stops OpenGL from working, this isn't limited
to just the i915 but both the integrated and discrete GPUs on
multi-gpu systems, in other words this causes untold worlds of pain,

Ville tried to come up with a Great Hack to fiddle the required VGA
I/O ops behind everyone's back using stop_machine, but that didn't
really work out [2]. Given that we're fairly late in the -rc stage for
such games let's just revert this all.

One thing we might want to keep is to delay the disabling of the vga
decoding until the fbdev emulation and the fbcon screen is set up. If
we kill vga mem decoding beforehand fbcon can end up with a white
square in the top-left corner it tried to save from the vga memory for
a seamless transition. And we have bug reports on older platforms
which seem to match these symptoms.

But again that's something to play around with in -next.

References: [1] http://lists.x.org/archives/xorg-devel/2013-September/037763.html
References: [2] http://www.spinics.net/lists/intel-gfx/msg34062.html
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Revert "drm/i915: Delay disabling of VGA memory until vgacon->fbcon handoff is done"

This reverts commit 6e1b4fdad5157bb9e88777d525704aba24389bee.

This is part of a revert due to a userspace breakage, better explained in the revert of 1a1a4cbf4906a13c0c377f708df5d94168e7b582.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge branch 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

Regression fixes for audio and UVD, several hang fixes,
some DPM fixes.

* 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon: re-enable sw ACR support on pre-DCE4
  drm/radeon/dpm: disable bapm on TN asics
  drm/radeon: improve soft reset on CIK
  drm/radeon: improve soft reset on SI
  drm/radeon/dpm: off by one in si_set_mc_special_registers()
  drm/radeon/dpm/btc: off by one in btc_set_mc_special_registers()
  drm/radeon: forever loop on error in radeon_do_test_moves()
  drm/radeon: fix hw contexts for SUMO2 asics
  drm/radeon: fix typo in CP DMA register headers
  drm/radeon/dpm: disable multiple UVD states
  drm/radeon: use hw generated CTS/N values for audio
  drm/radeon: fix N/CTS clock matching for audio
  drm/radeon: use 64-bit math to calculate CTS values for audio (v2)
  drm/edid: catch kmalloc failure in drm_edid_to_speaker_allocation

dma: edma.c: remove edma_desc leakage

Free memory allocated to edma_desc when failing to allocate slot.

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>

rcar-hpbdma: add parameter to set_slave() method

Commit 4981c4dc194efb18f0e9a02f1b43e926f2f0d2bb (DMA: shdma: switch DT mode to
use configuration data from a match table) added a new parameter to set_slave()
method but unfortunately got merged later than commit c4f6c41ba790bbbfcebb4c47a
(dma: add driver for R-Car HPB-DMAC), so that the HPB-DMAC driver retained the
old prototype which caused this warning:

drivers/dma/sh/rcar-hpbdma.c:485: warning: initialization from incompatible
pointer type

The newly added parameter is used to override DMA slave address from 'struct
hpb_dmae_slave_config', so we have to add the 'slave_addr' field to 'struct
hpb_dmae_chan', conditionally assign it in set_slave() method, and return in
slave_addr() method.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Tested-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>

rcar-hpbdma: remove shdma_free_irq() calls

Commit c1c63a14f4f2419d093acd7164eccdff315baa86 (DMA: shdma: switch to managed
resource allocation) got rid of shdma_free_irq() but unfortunately got merged
later than commit c4f6c41ba790bbbfcebb4c47a709ac8ff1fe1af9 (dma: add driver for
R-Car HPB-DMAC), so that the HPB-DMAC driver retained the calls and got broken:

drivers/dma/sh/rcar-hpbdma.c: In function `hpb_dmae_alloc_chan_resources':
drivers/dma/sh/rcar-hpbdma.c:435: error: implicit declaration of function
`shdma_free_irq'

Fix this compilation error by removing the remaining shdma_free_irq() calls.

Reported-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Tested-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>

Btrfs: fix oops caused by the space balance and dead roots

When doing space balance and subvolume destroy at the same time, we met
the following oops:

kernel BUG at fs/btrfs/relocation.c:2247!
RIP: 0010: [<ffffffffa04cec16>] prepare_to_merge+0x154/0x1f0 [btrfs]
Call Trace:
[<ffffffffa04b5ab7>] relocate_block_group+0x466/0x4e6 [btrfs]
[<ffffffffa04b5c7a>] btrfs_relocate_block_group+0x143/0x275 [btrfs]
[<ffffffffa0495c56>] btrfs_relocate_chunk.isra.27+0x5c/0x5a2 [btrfs]
[<ffffffffa0459871>] ? btrfs_item_key_to_cpu+0x15/0x31 [btrfs]
[<ffffffffa048b46a>] ? btrfs_get_token_64+0x7e/0xcd [btrfs]
[<ffffffffa04a3467>] ? btrfs_tree_read_unlock_blocking+0xb2/0xb7 [btrfs]
[<ffffffffa049907d>] btrfs_balance+0x9c7/0xb6f [btrfs]
[<ffffffffa049ef84>] btrfs_ioctl_balance+0x234/0x2ac [btrfs]
[<ffffffffa04a1e8e>] btrfs_ioctl+0xd87/0x1ef9 [btrfs]
[<ffffffff81122f53>] ? path_openat+0x234/0x4db
[<ffffffff813c3b78>] ? __do_page_fault+0x31d/0x391
[<ffffffff810f8ab6>] ? vma_link+0x74/0x94
[<ffffffff811250f5>] vfs_ioctl+0x1d/0x39
[<ffffffff811258c8>] do_vfs_ioctl+0x32d/0x3e2
[<ffffffff811259d4>] SyS_ioctl+0x57/0x83
[<ffffffff813c3bfa>] ? do_page_fault+0xe/0x10
[<ffffffff813c73c2>] system_call_fastpath+0x16/0x1b

It is because we returned the error number if the reference of the root was 0
when doing space relocation. It was not right here, because though the root
was dead(refs == 0), but the space it held still need be relocated, or we
could not remove the block group. So in this case, we should return the root
no matter it is dead or not.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

Btrfs: insert orphan roots into fs radix tree

Now we don't drop all the deleted snapshots/subvolumes before the space
balance. It means we have to relocate the space which is held by the dead
snapshots/subvolumes. So we must into them into fs radix tree, or we would
forget to commit the change of them when doing transaction commit, and it
would corrupt the metadata.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

Btrfs: limit delalloc pages outside of find_delalloc_range

Liu fixed part of this problem and unfortunately I steered him in slightly the
wrong direction and so didn't completely fix the problem.  The problem is we
limit the size of the delalloc range we are looking for to max bytes and then we
try to lock that range.  If we fail to lock the pages in that range we will
shrink the max bytes to a single page and re loop.  However if our first page is
inside of the delalloc range then we will end up limiting the end of the range
to a period before our first page.  This is illustrated below

[0 -------- delalloc range --------- 256mb]
                                  [page]

So find_delalloc_range will return with delalloc_start as 0 and end as 128mb,
and then we will notice that delalloc_start < *start and adjust it up, but not
adjust delalloc_end up, so things go sideways.  To fix this we need to not limit
the max bytes in find_delalloc_range, but in find_lock_delalloc_range and that
way we don't end up with this confusion.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

Btrfs: use right root when checking for hash collision

btrfs_rename was using the root of the old dir instead of the root of the new
dir when checking for a hash collision, so if you tried to move a file into a
subvol it would freak out because it would see the file you are trying to move
in its current root. This fixes the bug where this would fail

btrfs subvol create test1
btrfs subvol create test2
mv test1 test2.

Thanks to Chris Murphy for catching this,

Cc: stable@vger.kernel.org
Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

bcache: Fix a null ptr deref regression

Commit c0f04d88e46d ("bcache: Fix flushes in writeback mode") was fixing
a reported data corruption bug, but it seems some last minute
refactoring or rebasing introduced a null pointer deref.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Reported-by: Gabriel de Perthuis <g2p.code@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
"Fix root cause of crash/error seen in applesmc driver"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (applesmc) Always read until end of data

Merge branch 'rc-fixes' of git://git./linux/kernel/git/mmarek/kbuild

Pull kbuild fix from Michal Marek:
"Here is an ARM Makefile fix that you even acked.  After nobody wanted
  to take it, it ended up in the kbuild tree"

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  arm, kbuild: make "make install" not depend on vmlinux

Merge git://www.linux-watchdog.org/linux-watchdog

Pull watchdog fix from Wim Van Sebroeck:
"Make sure that the hpwdt driver will not load auxilary iLO devices"

* git://www.linux-watchdog.org/linux-watchdog:
watchdog: hpwdt: Patch to ignore auxilary iLO devices

kobject: show debug info on delayed kobject release

Useful for locating buggy drivers on kernel oops.

It may add dozens of new lines to boot dmesg. DEBUG_KOBJECT_RELEASE is
hopefully only enabled in debug kernels (like maybe the Fedora rawhide
one, or at developers), so being a bit more verbose is likely ok.

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

watchdog: hpwdt: Patch to ignore auxilary iLO devices

This patch is to prevent hpwdt from loading on any auxilary iLO devices defined
after the initial (or main) iLO device. All auxilary iLO devices will have a
subsystem device ID set to 0x1979 in order for hpwdt to differentiate between
the two types.

Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Tested-by: Lisa Mitchell <lisa.mitchell@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>

Merge tag 'random_for_linus' of git://git./linux/kernel/git/tytso/random

Pull /dev/random changes from Ted Ts'o:
"These patches are designed to enable improvements to /dev/random for
  non-x86 platforms, in particular MIPS and ARM"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
  random: allow architectures to optionally define random_get_entropy()
  random: run random_int_secret_init() run after all late_initcalls

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"Fixes for 3.12-rc5: two old PPC bugs and one new (3.12-rc2) x86 bug"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: ppc: booke: check range page invalidation progress on page setup
  KVM: PPC: Book3S HV: Fix typo in saving DSCR
  KVM: nVMX: fix shadow on EPT

Merge tag 'spi-v3.12-rc4' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"This is all driver updates, mostly fixes for error handling paths
  except for the s3c64xx and hspi fixes for trying to use runtime PM
  before it is enabled and the pxa2xx fix for interactions between power
  management and interrupt handling"

* tag 'spi-v3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: atmel: Fix incorrect error path
  spi/hspi: fixup Runtime PM enable timing
  spi/s3c64xx: Ensure runtime PM is enabled prior to registration
  spi/clps711x: drop clk_put for devm_clk_get in spi_clps711x_probe()
  spi: fix return value check in dspi_probe()
  spi: mpc512x: fix error return code in mpc512x_psc_spi_do_probe()
  spi: clps711x: Don't call kfree() after spi_master_put/spi_unregister_master
  spi/pxa2xx: check status register as well to determine if the device is off