Skip to content

Implement a few fixes and performance optimisations#2

Open
JakeChampion wants to merge 38 commits intounikraft:mainfrom
JakeChampion:jake/nimbly
Open

Implement a few fixes and performance optimisations#2
JakeChampion wants to merge 38 commits intounikraft:mainfrom
JakeChampion:jake/nimbly

Conversation

@JakeChampion
Copy link
Copy Markdown

each commit should be atomic and describe exactly the changes made - they can be cherry-picked instead of accepting all of them in a single PR if preferred :D

…res entries to be in strict alphabetical order. Previously, . and .. were hardcoded at the front of the directory listing. This broke lookup for filenames starting with ASCII characters that sort before . (0x2E) - specifically: ! " # $ % & ' ( ) * + , - (0x21-0x2D)
… writer

The '.' and '..' dirents were written with Nid 0 (zero value), which incorrectly resolves to the root inode regardless of actual parent. Set '.' to the current directory's own Nid and '..' to the parent directory's Nid by looking up their inodes during directory encoding.

Adds a root-level variant of the sorting test to cover the case where only '.' exists (no '..' entry).
ReadDir iterated map keys in non-deterministic order, violating the fs.ReadDirFile contract which requires entries sorted by name. This could cause non-deterministic fs.WalkDir traversal for any consumer.

The pagination logic also used n as an absolute index rather than a count from the current position, so the second call to ReadDir(2) would return entries 0-1 again instead of 2-3.
The writer computed and stored a CRC32C checksum but never set the FeatureCompatSuperBlockChecksum flag. The reader checks this flag before verifying, so corrupted images were silently accepted.
RootNid defaulted to 0 (zero value), which only worked because fs.WalkDir visits "." first, placing the root inode at NID 0. Read the root inode's assigned NID after the layout pass and set it in the superblock explicitly to avoid a silent breakage if inode ordering ever changes.
The getOwner functions in owner_unix.go and owner_windows.go were calling fi.Sys() twice per case branch - once in the switch and again to re-assert the same type. Use switch variable binding to capture the typed value directly.
firstPass was calling dataForInode for every inode just to get its size, then discarding the data. For regular files this meant opening and statting each file twice - once for size in firstPass, once for content in the write phase. Cache file sizes from populateInodes and use them directly in firstPass, reducing regular file opens to exactly one per file. Also removes redundant type assertions in owner_unix.go and owner_windows.go.
The fs.ReadDirFile contract requires that when n > 0 and no entries remain, ReadDir returns (nil, io.EOF). Previously it returned an empty slice with nil error, which could cause callers using paginated reads to miss the end-of-directory signal.
Extended inodes store nanosecond-precision modification times via the MtimeNsec field, but ModTime() was constructed with time.Unix(mtime, 0), discarding the nanosecond component. Now uses the stored MtimeNsec value.
The compact inode decision compared fi.ModTime() == time.Time{}, but time.Time{} has a nil Location while some FS implementations return the equivalent zero time with a UTC Location (e.g. time.Time{}.UTC()). These represent the same instant but are not == equal. Using IsZero() correctly handles all representations of the zero time.
The fs.FS contract requires that Open(".").Stat().Name() returns ".". The root dir was created with name "" in New(), and Sub() roots also returned their base name rather than ".". Track the opened name in the fhDir handle so Stat() returns the correct value.
…tract

Type() was returning the full mode (including permission bits) instead of just the type bits. For regular files this meant returning e.g. 0o644 instead of 0, and for directories it included permission bits alongside ModeDir. Also changed the error fallback from 0 (regular file) to ModeIrregular to avoid misidentifying broken entries.
StatLink omitted the image field when constructing fileInfo, unlike Stat and dirEntry.Info which both set it. This inconsistency could cause issues for downstream code that accesses the image through the fileInfo struct.
filepath.Join already calls Clean on its result, so wrapping it in an additional Clean call is unnecessary.
…tract

Type() was returning the full mode including permission bits instead of just the type bits. For a directory with perm 0755 it returned ModeDir|0755 instead of ModeDir, and for regular files it returned e.g. 0644 instead of 0. This could break code using exact switch matches on Type().
Go convention requires char devices to have both ModeDevice and ModeCharDevice set. The reader only set ModeCharDevice, and the writer's switch cases matched bare ModeCharDevice which would never match the combined flags from a properly-constructed FileMode. Fixed all three conversion paths and added internal unit tests.
When creating an EROFS image from an existing EROFS filesystem, fi.Sys() returns *erofs.Inode which getOwner didn't handle. UID and GID silently became 0. Added a case for *Inode in both unix and windows variants of getOwner, and a round-trip test verifying ownership survives.
The file struct stored a reference to the Image that was never read by any method. File data access goes through de.getInode() which uses the dirEntry's own image reference.
File.Stat().Size() was backed by bytes.Buffer.Len() which returns unread bytes, not total file size. After a partial Read(), Size() would shrink. Added a dedicated size field captured at open time so Size() always reflects the full file length.
…StatLink

The fs.FS contract requires implementations to reject invalid paths (leading slashes, ".." components, etc.) with ErrInvalid. Updated tests to use valid relative paths instead of leading-slash paths.
tarfs lacked Lstat, so it didn't satisfy Go 1.24's fs.ReadLinkFS. Also, getIno() returns 0 for filesystems without real inode numbers (memfs, tarfs), causing all non-directory files to be falsely grouped as hardlinks. Fall back to simple NID assignment when fsIno is 0.
The reader masked with fs.ModePerm (0o777), discarding bits 9-11 (S_ISUID, S_ISGID, S_ISVTX). The writer correctly encoded these via statModeFromFileMode(), but the reader never converted them back, silently losing them on round-trip.
The resolve() function followed symlinks recursively with no depth limit. A tar archive with circular symlinks (e.g. a -> b, b -> a) would cause a stack overflow. Added a maxSymlinks = 40 limit matching the Linux kernel MAXSYMLINKS constant, consistent with the erofs implementation.
The implicit parent directory creation loop unconditionally overwrote directory entries with defaults (mode 0o755). If the tar archive contained an explicit directory entry with specific permissions or timestamps before its child files, that metadata was silently replaced. Now only creates a default entry when no entry exists for that path.
…node

Type() was loading the full inode from disk just to extract the type bits, which were already available in the dirEntry.typ field. This caused unnecessary I/O and silently returned 0 (regular file) when the inode failed to load. Now maps directly from the dirent's file type, consistent with how IsDir() already works.
Was returning a bare errors.New("not a directory"). Now wraps it in fs.PathError with op and path, consistent with the other Filesystem methods.
…g them

getInode() stored errors in inodeErr but no caller ever checked it. A failed inode load returned a zero-value Inode with nil image, causing wrong results from Info()/Type() and nil-pointer panics from Data()/Readlink()/Lookup(). All callers now check de.inodeErr and return it.
…Lstat, StatLink

The fs.FS contract requires Open to reject names that don't satisfy fs.ValidPath. None of the tarfs public methods validated paths. Test entries with inherently invalid fs.FS names (non-UTF8, trailing slashes) are now skipped since they cannot be accessed through the fs.FS interface.
FeatureCompatSuperBlockChecksum in reader.go and EROFS_FEATURE_COMPAT_SB_CHKSUM in constants.go had the same value. Removed the duplicate and unified on the constants.go name.
Was discarding the underlying error and returning a generic "image size is too small" message. Now wraps the original error for better diagnostics.
filepath functions are OS-dependent (backslash handling on Windows). fs.FS paths are always forward-slash separated, so the path package is correct. Replaced all filepath.Dir/Base/Clean/IsAbs/ToSlash calls with their path or strings equivalents.
Open() unconditionally called d.data() which is nil for directories, causing a nil pointer dereference. Now returns a dirFile implementing fs.ReadDirFile with sorted, paginated ReadDir(n) and proper io.EOF semantics, matching the fs.FS contract.
The fs.FS contract requires that Open on a directory returns an fs.ReadDirFile. Previously it returned a plain file that lacked ReadDir(n), so callers using Open then type-asserting to fs.ReadDirFile would fail. Added dirFile type with paginated ReadDir and proper io.EOF semantics.
The lock-unlock-relock pattern between parent and child directories allowed concurrent MkdirAll calls on overlapping paths to race. Both could see child == nil, both create a new dir, and the second would overwrite the first, orphaning its subtree. Now locks the child before unlocking the parent.
Add full symbolic link support including creation (Symlink), reading (ReadLink), and stat without following (Lstat, StatLink). Path resolution follows symlinks transparently with cycle detection (maxSymlinks=40). MkdirAll, WriteFile, Open, Sub all work through symlink directories. DirEntry.Type() correctly reports ModeSymlink.
@JakeChampion JakeChampion marked this pull request as ready for review March 3, 2026 14:22
@craciunoiuc craciunoiuc self-requested a review April 14, 2026 14:46
@craciunoiuc
Copy link
Copy Markdown
Member

Will start to go through this one by one, stay tuned 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants