Comments (9)
Booting with zlog=kalloc.256
and using findoldest
zstack
and countpcs
we find this situation;
(gdb) findoldest
oldest record is at log index 393:
--------------- ALLOC 0xffffff803276ec00 : index 393 : ztime 21643824 -------------
0xffffff800024352e <zalloc_canblock+78>: mov %eax,-0xcc(%rbp)
0xffffff80002245bd <get_zone_search+23>: jmpq 0xffffff80002246d8 <KALLOC_ZINFO_SALLOC+35>
0xffffff8000224c39 <OSMalloc+89>: mov %rax,-0x18(%rbp)
0xffffff7f80e847df <zfs_kmem_alloc+15>: mov %rax,%r15
0xffffff7f80e90649 <arc_buf_alloc+41>: mov %rax,-0x28(%rbp)
(gdb) countpcs 0xffffff7f80e90649 # arc_buf_alloc
occurred 2390 times in log (59% of records)
So it is less of a leak, and more of a where is the arc reclaim situation.
Other stacks are
--------------- ALLOC 0xffffff8012568300 : index 700 : ztime 22130651 -------------
0xffffff800024352e <zalloc_canblock+78>: mov %eax,-0xcc(%rbp)
0xffffff80002245bd <get_zone_search+23>: jmpq 0xffffff80002246d8 <KALLOC_ZINFO_SALLOC+35>
0xffffff8000224c39 <OSMalloc+89>: mov %rax,-0x18(%rbp)
0xffffff7f80e847df <zfs_kmem_alloc+15>: mov %rax,%r15
0xffffff7f80ea0cbb <dbuf_create+59>: mov %rax,-0x40(%rbp)
0xffffff7f80ea0900 <__dbuf_hold_impl+1360>: mov -0x68(%rbp),%rcx
0xffffff7f80ea02fc <dbuf_hold_impl+140>: mov $0x780,%rsi
0xffffff7f80ea0bf2 <dbuf_hold+50>: mov %eax,-0x24(%rbp)
0xffffff7f80ea83bb <dmu_buf_hold_array_by_dnode+619>: mov %rax,-0x98(%rbp)
0xffffff7f80ea99d9 <dmu_write_uio_dnode+121>: mov %eax,-0x38(%rbp)
0xffffff7f80ea993c <dmu_write_uio_dbuf+108>: mov %eax,-0x3c(%rbp)
0xffffff7f80f5494b <zfs_write+2923>: mov %eax,-0xa0(%rbp)
0xffffff7f80f605e8 <zfs_vnop_write+88>: mov %eax,-0x10(%rbp)
0xffffff8000311d62 <VNOP_WRITE+18>: mov %eax,%ebx
0xffffff8000308139 <vn_write+617>: mov %eax,%r14d
But as it is a dmu_hold, it should be short lived.
(gdb) countpcs dbuf_create+59
occurred 1578 times in log (39% of records)
from zfs.
Ok, seems to be just arc running wild. If I take arc_max / 2, it still happens, but arc_mac / 4 has completed iozone.
My VM has 2GB so that is quite conservative (250MB arc?)
from zfs.
I would seem that OSX keeps the memory allocations per size. So alloc.256, alloc.512, alloc.1024 alloc.4096 etc.
Even though we are staying under our self-imposed limit, we can in fact run out of a specific size well before than. Usually 512 or 1024. We may need to keep internal tally of the sizes as well. (Or explore a way to ask Darwin for those statistics).
Currently all ZFS memory is kmem, can Darwin do Linux style vmem allocations as well? Especially now that we are linking with IOkit.
from zfs.
I had a look into this, and I think I have an idea how to fix this but I need to check a bit more. Hopefully I can say more over the weekend.
from zfs.
Basically, we don't appear to be reclaiming arc as we should. We are still using the arc from ZOL, which relies on SPL doing a reclaim callback. Clearly we don't do this either, so current thoughts are to also port over the FreeBSD arc.c to the project.
I added "total_allocated" to SPL layer (since all our memory allocations go through there) as well as dumping the arc every second, and the graphs look like this.
Note the very first drop way over on the left axis, that is arc having warmed up and doing first reclaim. The top is pretty much where the line should be, that we never go over.
from zfs.
Ok, I'm 40% sure this might have something to do with it;
void kmem_cache_reap_now(kmem_cache_t *cache)
{
}
from zfs.
Ok, kmem_cache_reap_now()
is not really part of the reclaim. It is handled with calls to arc_adjust()
which calls arc_evict()
. We were in fact evicting fine, and all 6 lists (mru/mfu/ghosts * 2) went to zero. It was zfs.arc_meta_used
which climbed upwards. Most likely due to a sa_buf_hold()
call never released.
At the moment, ARC appears to behave as expected, keeping a 2GB machine up by evicting.
from zfs.
The reclaim thread is working surprisingly well and probably should not have waited this long to implement it.
We do have a lingering issue with ARC, sometimes zfs.arc_meta_used
will climb way out of control. We are trying to evict, so presumably we are leaking buffers, or similar. It can trigger the panic for this Issue.
from zfs.
With the latest commits 3c65e48 we have managed to clean up most issues with ARC. It no longer balloons. We most likely has one sa_buf_free() missed in znode, and the reclaim thread would stop. Changing cv_signal()
to cv_broadcast()
and it solved that issue. We should investigate cv_signal()
.
For now, even 2 rsyncs from root will complete, with ARC staying level. We may need to revisit ARC work in future for tweaks, but I consider the original issue to be avoided.
from zfs.
Related Issues (20)
- diskutil erase ZFS shows -69832: file system formatter failed HOT 1
- Error "no pools available to import" in Big Sur x86_64
- Unexpectedly low performance of VMs with openzfsonosx backing store HOT 2
- Xcode not happy with ZFS HOT 14
- Photos cannot import photos to library on zfs HOT 5
- _zfs_findernotify_callback panic HOT 1
- Kernel Panic - OpenZFS 2.1.0 on Apple Silicon HOT 11
- Kernel Panic (Time Out) - OpenZFS 2.1.0 on Apple Silicon HOT 20
- Kernal Panic - 2.1.x arm64 HOT 16
- Kernel Panic: Kernel Data Abort (OpenZFS 2.1.0, M1 Ultra) HOT 15
- how can i create block dev as on linux HOT 1
- 2.1.99: `zsysctl` sets tunable incorrectly HOT 2
- Launchd: no pools available to import HOT 3
- Kernel Panic: OpenZFS 2.1.0 Catalina HOT 1
- System freezes with command: zpool offline
- System restarts with command: zpool export
- System freezes or reboots importing a degraded mirror image HOT 52
- System freezes on import HOT 1
- Mac ZFS triggers duplicating finding programs to fail randomly HOT 9
- `cargo build` breaks with "resource temporarily unavailable (error 35)" (I'm assuming EAGAIN) on v2.2.0 and v2.2.2 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zfs.