marcelmay / hfsa Goto Github PK
View Code? Open in Web Editor NEWHadoop FSImage Analyzer (HFSA)
License: Apache License 2.0
Hadoop FSImage Analyzer (HFSA)
License: Apache License 2.0
Check if a directory contains any children.
Print out a warning message when max JVM heap is less than 2x fsimage size and recommendation for JVM heap size.
Testing only:
Report the top small files
Memory cost for 1 million inodes are about 8 MiB (( ~ number inodes * sizof(long) ).
Add helper for handling block storage policy.
For consistency with java.io.File and unix, support POSIX-like double slashes in paths when querying FSImages.
Path examples:
Occurs when very large files lead to auto extending bucket size (see #5 ).
Refactor inodes handling (storing, parsing, extracting) into a new class in order to enable future alternative implementations (inodes as single huge byte array or (direct memory) ByteBuffer etc ...)
Check if value is null Instead of first testing if key exists:
if(map.contains(key)) {
... map.get(key);
}```
Currently, hfsa lib fails processing compressed fsimages (see marcelmay/hadoop-hdfs-fsimage-exporter/issues/34 ).
FSImageUtil.wrapInputStreamForCompression
Support CLI option -v for INFO or -vv for DEBUG output.
Support Hadoop 2.x and 3.x fsimage versions.
Hadoop 3 changed the fsimage layout when introducing new HDFS feature erasure encoding.
Update com.google.guava:guava:21.0 to latest 24.0-jre
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at de.m3y.hadoop.hdfs.hfsa.tool.FormatUtil.boxAndPadWithZeros(FormatUtil.java:77)
at de.m3y.hadoop.hdfs.hfsa.tool.HdfsFSImageTool.doSummary(HdfsFSImageTool.java:134)
at de.m3y.hadoop.hdfs.hfsa.tool.HdfsFSImageTool.doPerform(HdfsFSImageTool.java:97)
at de.m3y.hadoop.hdfs.hfsa.tool.HdfsFSImageTool.main(HdfsFSImageTool.java:314)
Simplifies test assertions.
Update overall dependencies:
Test scope:
Fine tune for smaller footprint, as Hadoop Maven dependencies (transitively) pull in a lot.
For reuse, eg in https://github.com/marcelmay/hadoop-hdfs-fsimage-exporter
Support printing out the hfsa version via option -V
Current guava version 27.0-jre results in shade warning, which is fixed in 27.0.1-jre :
[WARNING] failureaccess-1.0.jar, guava-27.0-jre.jar define 2 overlapping classes:
[WARNING] - com.google.common.util.concurrent.internal.InternalFutureFailureAccess
[WARNING] - com.google.common.util.concurrent.internal.InternalFutures
See guava issue 3302 for details.
Auto-resizing is resizing exactly one bucket less than required.
Print inode content details, for eg verification or debugging.
Tool incorrectly reports symlinks.
Increase from default 8KiB to 64KiB, for FSImage files on non-SSD HDs.
Dependency | Version |
---|---|
org.apache.hadoop:hadoop-common | 2.7.3 -> 2.7.5 |
org.apache.hadoop:hadoop-hdfs | 2.7.3 -> 2.7.5 |
commons-io:commons-io | 2.5 -> 2.6 |
When calling computeBucketUpperBorders on a newly created instance:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at de.m3y.hadoop.hdfs.hfsa.util.SizeBucket$Bucket2nModel.computeBucketUpperBorders(SizeBucket.java:78)
at de.m3y.hadoop.hdfs.hfsa.util.SizeBucket.computeBucketUpperBorders(SizeBucket.java:124)
Micro tuning file creation from about ~9s to ~7s per 100 dirs / 26k files.
Using a fsimage file, show all files and directories owned by a specified list of users
The root INode has always the same INode ID and can be easily cached to prevent repeated reconstruction from INode bytes cache.
Files directly under root receive the full path including file name in FsVisitor.onFile(...).
Eg you might only see NPE but not that there was an OoME when loading image data.
This regression was introduced when fixing #40 .
Completely skip sections of no interest
Provide a CLI toggle for the fsimage generator for enabling/disabling compression.
Update info.picocli:picocli from 3.7.0 to 3.9.6
i do not find the change time and access time of the file, is there that stat?
if not , can upgrade the project to ?
thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.