Skip to content

HYPERFLEET-543 - feat: Implement Cluster and NodePool DELETE endpoints with cascade logic#106

Open
mliptak0 wants to merge 15 commits intoopenshift-hyperfleet:mainfrom
mliptak0:HYPERFLEET-543
Open

HYPERFLEET-543 - feat: Implement Cluster and NodePool DELETE endpoints with cascade logic#106
mliptak0 wants to merge 15 commits intoopenshift-hyperfleet:mainfrom
mliptak0:HYPERFLEET-543

Conversation

@mliptak0
Copy link
Copy Markdown

@mliptak0 mliptak0 commented Apr 14, 2026

Summary

  • Implemented Cluster and NodePool soft-delete endpoints

  • Refactored DAO methods to remove Unscoped calls for fetching Clusters and NodePools (previously GORM's built-in soft-delete mechanism excluded rows from queries which have deleted_at non-null, now we use deleted_time which doesn't use this and delete queries should do hard-delete)

  • Bumped oapi-codegen version to fix missing omitempty on generated response objects

  • Updated OpenAPI spec with examples for Cluster and NodePool schemas

  • HYPERFLEET-543

Test Plan

  • Unit tests added/updated
  • make test-all passes
  • make lint passes
  • Helm chart changes validated with make test-helm (if applicable)
  • Deployed to a development cluster and verified (if Helm/config changes)
  • E2E tests passed (if cross-component or major changes)

Summary by CodeRabbit

  • New Features

    • Added DELETE endpoints for clusters and cluster-scoped node pools (returns 202 Accepted).
    • Soft-deletion now records who requested deletion and when; responses include deleted_time and deleted_by.
    • Deleting a cluster cascades soft-deletion to its node pools.
  • Updates

    • API version bumped to 1.0.6.
    • Dependency upgrades and tooling updates for code generation and runtime.

mliptak0 added 7 commits April 8, 2026 16:05
…rated response objects, newer version fixes this
…endpoints to include soft-deleted records for adapters to be able to fetch them by using Unscoped methods (TODO: Find out if it's possible to use previous queries but without implicit filter for soft-deleted objects provided by gorm)
…ng deletion to associated adapter statuses. Update OpenAPI definitions to include `deleted_at` timestamps for tracking deletion requests. Enhance DAO methods to support deletion requests and ensure proper handling in service layers.
…ing cascading deletion of associated adapter statuses. Implement RequestDeletion methods in DAOs and update tests to verify deletion behavior, including idempotency and proper handling of already deleted resources.
…nd NodePools. Introduce `deleted_time` and `deleted_by` fields for tracking deletion requests, and enhance DAO methods to support cascading deletion behavior. Update tests to verify proper handling of deletion requests and ensure idempotency.
…or parameter from RequestDeletion methods. Update deletion logic to use a default system identity for tracking. Enhance OpenAPI definitions with examples for Cluster and NodePool schemas, including detailed status conditions.
…y field instead of account email. This change ensures consistency in tracking deletion requests across tests.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

Walkthrough

This pull request implements soft-deletion for Clusters and NodePools: new DELETE endpoints return HTTP 202 Accepted and services/DAOs expose RequestDeletion methods that set deleted_time and deleted_by instead of hard-deleting. Models (Cluster, NodePool, NodePoolCreateResponse) gain deleted_time/deleted_by fields; Meta’s GORM DeletedAt was removed. DB migrations rename deleted_atdeleted_time, add deleted_by, and change unique indexes to apply unconditionally. Deletion cascades from Cluster → NodePools. Tests, presenters, handlers, and codegen/module versions were updated accordingly.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Handler as ClusterHandler
    participant Service as ClusterService
    participant ClusterDAO as ClusterDAO
    participant NodePoolDAO as NodePoolDAO
    participant DB as Database

    Client->>Handler: DELETE /api/hyperfleet/v1/clusters/{id}
    Handler->>Service: RequestDeletion(ctx, id)
    Service->>ClusterDAO: RequestDeletion(ctx, id)
    ClusterDAO->>DB: SELECT cluster WHERE id=?
    DB-->>ClusterDAO: cluster
    alt already deleted
        ClusterDAO-->>Service: cluster, alreadyDeleted=true
    else mark deleted
        ClusterDAO->>DB: UPDATE clusters SET deleted_time=now(), deleted_by='system@hyperfleet.local', generation=generation+1
        DB-->>ClusterDAO: success
        ClusterDAO-->>Service: cluster, alreadyDeleted=false
    end
    Service->>NodePoolDAO: RequestDeletionByOwner(ctx, cluster.id, timeNow)
    NodePoolDAO->>DB: UPDATE node_pools SET deleted_time=now(), deleted_by='system@hyperfleet.local', generation=generation+1 WHERE owner_id=? AND deleted_time IS NULL
    DB-->>NodePoolDAO: success
    ClusterDAO-->>Service: cluster
    Service-->>Handler: cluster
    Handler->>Handler: PresentCluster(cluster)
    Handler-->>Client: 202 Accepted + cluster JSON
Loading
sequenceDiagram
    actor Client
    participant Handler as ClusterNodePoolsHandler
    participant ClusterSvc as ClusterService
    participant NodePoolSvc as NodePoolService
    participant NodePoolDAO as NodePoolDAO
    participant DB as Database

    Client->>Handler: DELETE /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}
    Handler->>ClusterSvc: Get(ctx, cluster_id)
    ClusterSvc-->>Handler: cluster
    Handler->>NodePoolSvc: Get(ctx, nodepool_id)
    NodePoolSvc-->>Handler: nodePool
    Handler->>Handler: verify nodePool.OwnerID == cluster_id
    Handler->>NodePoolSvc: RequestDeletion(ctx, nodepool_id)
    NodePoolSvc->>NodePoolDAO: RequestDeletion(ctx, nodepool_id)
    NodePoolDAO->>DB: SELECT node_pool WHERE id=?
    DB-->>NodePoolDAO: nodePool
    alt already deleted
        NodePoolDAO-->>NodePoolSvc: nodePool, alreadyDeleted=true
    else mark deleted
        NodePoolDAO->>DB: UPDATE node_pools SET deleted_time=now(), deleted_by='system@hyperfleet.local', generation=generation+1
        DB-->>NodePoolDAO: success
        NodePoolDAO-->>NodePoolSvc: nodePool, alreadyDeleted=false
    end
    NodePoolSvc-->>Handler: nodePool
    Handler->>Handler: PresentNodePool(nodePool)
    Handler-->>Client: 202 Accepted + nodePool JSON
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: implementing DELETE endpoints for Cluster and NodePool resources with cascade deletion logic, which is the central focus of all changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from tirthct and vkareh April 14, 2026 09:33
…nd NodePools (not needed as we no longer use GORM's soft-delete)
@mliptak0 mliptak0 marked this pull request as draft April 14, 2026 09:50
@mliptak0 mliptak0 changed the title Hyperfleet 543 Hyperfleet 543 - feat: Implement Cluster and NodePool DELETE endpoints with cascade logic Apr 14, 2026
@mliptak0 mliptak0 marked this pull request as ready for review April 14, 2026 12:32
@openshift-ci openshift-ci bot requested a review from rafabene April 14, 2026 12:32
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openapi/openapi.yaml`:
- Around line 172-175: Update the DELETE response examples in
openapi/openapi.yaml so the deleted_by field matches the actual deletion actor
used by the DAO (system@hyperfleet.local) instead of an end-user email; locate
the examples that include the deleted_by property (the blocks around the
occurrences currently at the ranges showing
deleted_time/created_by/updated_by/deleted_by and the other occurrence at
367-370) and replace the end-user email with system@hyperfleet.local for
deleted_by so the public contract reflects the real actor hardcoded by the DAO.

In `@pkg/dao/mocks/node_pool.go`:
- Around line 38-52: The mock methods on nodePoolDaoMock (RequestDeletion,
Delete, FindByOwnerID, RequestDeletionByOwner) currently return NotImplemented;
implement them with an in-memory backing store (e.g., the existing mock's
map/slice) to mirror pkg/dao/node_pool.go semantics: RequestDeletion should mark
a NodePool as deletion requested and return the updated *api.NodePool
(idempotent), Delete should remove the entry (idempotent), FindByOwnerID should
return all pools matching ownerID from the in-memory store, and
RequestDeletionByOwner should set deletion-requested (or schedule) on all owner
pools (idempotent); ensure concurrent access is guarded like other mock methods
and use the same struct fields (nodePoolDaoMock) so tests for deletion/cascade
flows behave like the real DAO.

In `@pkg/dao/node_pool.go`:
- Around line 81-103: The RequestDeletion implementation in sqlNodePoolDao
currently calls Save(nodePool) which can overwrite concurrent changes; instead,
after fetching the node via d.Get (and returning if already deleted), perform a
targeted atomic update like in RequestDeletionByOwner: use the session (g2 :=
(*d.sessionFactory).New(ctx)) and call g2.Model(&api.NodePool{}).Where("id = ?
AND deleted_time IS NULL", id).Updates(map[string]interface{}{"deleted_time": t,
"deleted_by": deletedBy, "generation": nodePool.Generation + 1}) (or equivalent)
and check RowsAffected/Err to detect a race, calling db.MarkForRollback(ctx,
err) and returning on error; keep the existing variables (t, deletedBy) and
return the updated nodePool only after the successful Updates call.

In `@pkg/db/migrations/202511111055_add_node_pools.go`:
- Around line 20-25: Don't modify the historical migration 202511111055; instead
create a new migration that updates the node_pools table to match the DAO
changes: add/rename columns (introduce deleted_time TIMESTAMPTZ NULL and
deleted_by VARCHAR(...) or rename deleted_at to deleted_time as needed), drop
and rebuild the old unique index and recreate the new unique index semantics the
DAO expects (e.g., rebuilding the index that previously referenced deleted_at to
reference deleted_time/deleted_by), and ensure the migration uses ALTER TABLE /
CREATE INDEX statements so environments that already ran 202511111055 get the
schema fixes without changing the original migration file.

In `@pkg/services/cluster.go`:
- Around line 116-125: The current delete flow returns early when alreadyDeleted
is true which can cause cluster deletion and nodepool cascade to diverge if
clusterDao.RequestDeletion succeeded but s.nodePoolDao.RequestDeletionByOwner
failed; modify the logic in the deletion handler (the block using
alreadyDeleted, cluster.DeletedTime, clusterDao.RequestDeletion and
s.nodePoolDao.RequestDeletionByOwner) to perform both writes atomically by
wrapping the cluster delete flag and the nodepool RequestDeletionByOwner call in
a single DB transaction, or alternatively implement retry/compensating logic
that, when encountering alreadyDeleted==true and a missing nodepool cascade,
re-attempts RequestDeletionByOwner (using cluster.DeletedTime) until success;
ensure you remove the unsafe early return or guard it behind
transaction/compensation so cascade cannot be skipped permanently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e510dabb-4185-4360-9bc6-5136481740f7

📥 Commits

Reviewing files that changed from the base of the PR and between 5db2baf and 5c89c2a.

⛔ Files ignored due to path filters (3)
  • .bingo/golangci-lint.sum is excluded by !**/*.sum
  • .bingo/oapi-codegen.sum is excluded by !**/*.sum
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (29)
  • .bingo/Variables.mk
  • .bingo/oapi-codegen.mod
  • .bingo/variables.env
  • go.mod
  • openapi/openapi.yaml
  • pkg/api/cluster_types.go
  • pkg/api/metadata_types.go
  • pkg/api/node_pool_types.go
  • pkg/api/presenters/cluster.go
  • pkg/api/presenters/node_pool.go
  • pkg/dao/adapter_status.go
  • pkg/dao/cluster.go
  • pkg/dao/mocks/cluster.go
  • pkg/dao/mocks/node_pool.go
  • pkg/dao/node_pool.go
  • pkg/db/migrations/202511111044_add_clusters.go
  • pkg/db/migrations/202511111055_add_node_pools.go
  • pkg/db/migrations/202511111105_add_adapter_status.go
  • pkg/handlers/cluster.go
  • pkg/handlers/cluster_nodepools.go
  • pkg/services/cluster.go
  • pkg/services/cluster_test.go
  • pkg/services/generic.go
  • pkg/services/node_pool.go
  • pkg/services/node_pool_test.go
  • pkg/services/util.go
  • plugins/clusters/plugin.go
  • test/helper.go
  • test/integration/clusters_test.go
💤 Files with no reviewable changes (1)
  • pkg/api/metadata_types.go

Comment on lines +38 to +52
func (d *nodePoolDaoMock) RequestDeletion(ctx context.Context, id string) (*api.NodePool, error) {
return nil, errors.NotImplemented("NodePool").AsError()
}

func (d *nodePoolDaoMock) Delete(ctx context.Context, id string) error {
return errors.NotImplemented("NodePool").AsError()
}

func (d *nodePoolDaoMock) FindByOwnerID(ctx context.Context, ownerID string) (api.NodePoolList, error) {
return nil, errors.NotImplemented("NodePool").AsError()
}

func (d *nodePoolDaoMock) RequestDeletionByOwner(ctx context.Context, ownerID string, t time.Time) error {
return errors.NotImplemented("NodePool").AsError()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Implement new NodePool mock methods to match deletion/cascade behavior.

All newly added DAO methods return NotImplemented, so this mock cannot support tests for the new deletion/idempotency/cascade flows and diverges from pkg/dao/node_pool.go behavior.

Proposed in-memory mock implementation
 func (d *nodePoolDaoMock) RequestDeletion(ctx context.Context, id string) (*api.NodePool, error) {
-	return nil, errors.NotImplemented("NodePool").AsError()
+	for _, nodePool := range d.nodePools {
+		if nodePool.ID != id {
+			continue
+		}
+		if nodePool.DeletedTime != nil {
+			return nodePool, nil
+		}
+		now := time.Now()
+		deletedBy := "system@hyperfleet.local"
+		nodePool.DeletedTime = &now
+		nodePool.DeletedBy = &deletedBy
+		nodePool.Generation++
+		return nodePool, nil
+	}
+	return nil, gorm.ErrRecordNotFound
 }
 
 func (d *nodePoolDaoMock) FindByOwnerID(ctx context.Context, ownerID string) (api.NodePoolList, error) {
-	return nil, errors.NotImplemented("NodePool").AsError()
+	result := api.NodePoolList{}
+	for _, nodePool := range d.nodePools {
+		if nodePool.OwnerID == ownerID {
+			result = append(result, nodePool)
+		}
+	}
+	return result, nil
 }
 
 func (d *nodePoolDaoMock) RequestDeletionByOwner(ctx context.Context, ownerID string, t time.Time) error {
-	return errors.NotImplemented("NodePool").AsError()
+	deletedBy := "system@hyperfleet.local"
+	for _, nodePool := range d.nodePools {
+		if nodePool.OwnerID == ownerID && nodePool.DeletedTime == nil {
+			nodePool.DeletedTime = &t
+			nodePool.DeletedBy = &deletedBy
+			nodePool.Generation++
+		}
+	}
+	return nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (d *nodePoolDaoMock) RequestDeletion(ctx context.Context, id string) (*api.NodePool, error) {
return nil, errors.NotImplemented("NodePool").AsError()
}
func (d *nodePoolDaoMock) Delete(ctx context.Context, id string) error {
return errors.NotImplemented("NodePool").AsError()
}
func (d *nodePoolDaoMock) FindByOwnerID(ctx context.Context, ownerID string) (api.NodePoolList, error) {
return nil, errors.NotImplemented("NodePool").AsError()
}
func (d *nodePoolDaoMock) RequestDeletionByOwner(ctx context.Context, ownerID string, t time.Time) error {
return errors.NotImplemented("NodePool").AsError()
}
func (d *nodePoolDaoMock) RequestDeletion(ctx context.Context, id string) (*api.NodePool, error) {
for _, nodePool := range d.nodePools {
if nodePool.ID != id {
continue
}
if nodePool.DeletedTime != nil {
return nodePool, nil
}
now := time.Now()
deletedBy := "system@hyperfleet.local"
nodePool.DeletedTime = &now
nodePool.DeletedBy = &deletedBy
nodePool.Generation++
return nodePool, nil
}
return nil, gorm.ErrRecordNotFound
}
func (d *nodePoolDaoMock) Delete(ctx context.Context, id string) error {
return errors.NotImplemented("NodePool").AsError()
}
func (d *nodePoolDaoMock) FindByOwnerID(ctx context.Context, ownerID string) (api.NodePoolList, error) {
result := api.NodePoolList{}
for _, nodePool := range d.nodePools {
if nodePool.OwnerID == ownerID {
result = append(result, nodePool)
}
}
return result, nil
}
func (d *nodePoolDaoMock) RequestDeletionByOwner(ctx context.Context, ownerID string, t time.Time) error {
deletedBy := "system@hyperfleet.local"
for _, nodePool := range d.nodePools {
if nodePool.OwnerID == ownerID && nodePool.DeletedTime == nil {
nodePool.DeletedTime = &t
nodePool.DeletedBy = &deletedBy
nodePool.Generation++
}
}
return nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/dao/mocks/node_pool.go` around lines 38 - 52, The mock methods on
nodePoolDaoMock (RequestDeletion, Delete, FindByOwnerID, RequestDeletionByOwner)
currently return NotImplemented; implement them with an in-memory backing store
(e.g., the existing mock's map/slice) to mirror pkg/dao/node_pool.go semantics:
RequestDeletion should mark a NodePool as deletion requested and return the
updated *api.NodePool (idempotent), Delete should remove the entry (idempotent),
FindByOwnerID should return all pools matching ownerID from the in-memory store,
and RequestDeletionByOwner should set deletion-requested (or schedule) on all
owner pools (idempotent); ensure concurrent access is guarded like other mock
methods and use the same struct fields (nodePoolDaoMock) so tests for
deletion/cascade flows behave like the real DAO.

Comment on lines +81 to +103
func (d *sqlNodePoolDao) RequestDeletion(ctx context.Context, id string) (*api.NodePool, error) {
g2 := (*d.sessionFactory).New(ctx)

nodePool, err := d.Get(ctx, id)
if err != nil {
db.MarkForRollback(ctx, err)
return nil, err
}

if nodePool.DeletedTime != nil {
return nodePool, nil
}

t := time.Now()
deletedBy := "system@hyperfleet.local"
nodePool.DeletedTime = &t
nodePool.DeletedBy = &deletedBy
nodePool.Generation++
if err := g2.Omit(clause.Associations).Save(nodePool).Error; err != nil {
db.MarkForRollback(ctx, err)
return nil, err
}
return nodePool, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and inspect the node_pool.go file
find . -name "node_pool.go" -path "*/pkg/dao/*" | head -5

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 130


🏁 Script executed:

# Check the structure around the mentioned lines
cat -n pkg/dao/node_pool.go | sed -n '75,110p'

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 1256


🏁 Script executed:

# Look for the RequestDeletionByOwner method mentioned in the comment to compare patterns
cat -n pkg/dao/node_pool.go | grep -A 25 "RequestDeletionByOwner"

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 2156


🏁 Script executed:

# Check the api.NodePool type definition to understand the schema
find . -name "*node_pool_types.go" | head -3

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 107


🏁 Script executed:

# Examine the type definition
cat pkg/api/node_pool_types.go | head -50

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 1754


Use targeted Updates with a guard clause instead of full-row Save to avoid lost concurrent updates.

Line 99 uses Save to persist the entire api.NodePool object. GORM's Save writes all columns when a primary key is present, so concurrent updates to unrelated fields like Spec, StatusConditions, or UpdatedBy can be silently lost. The pattern in RequestDeletionByOwner (lines 124–138) shows the correct approach: use a targeted Where("id = ? AND deleted_time IS NULL", id).Updates(...) call to atomically update only the deletion fields while guarding against double-deletion.

Suggested fix
 func (d *sqlNodePoolDao) RequestDeletion(ctx context.Context, id string) (*api.NodePool, error) {
 	g2 := (*d.sessionFactory).New(ctx)
 
 	nodePool, err := d.Get(ctx, id)
 	if err != nil {
 		db.MarkForRollback(ctx, err)
 		return nil, err
 	}
 
 	if nodePool.DeletedTime != nil {
 		return nodePool, nil
 	}
 
 	t := time.Now()
 	deletedBy := "system@hyperfleet.local"
-	nodePool.DeletedTime = &t
-	nodePool.DeletedBy = &deletedBy
-	nodePool.Generation++
-	if err := g2.Omit(clause.Associations).Save(nodePool).Error; err != nil {
-		db.MarkForRollback(ctx, err)
-		return nil, err
-	}
-	return nodePool, nil
+	result := g2.Model(&api.NodePool{}).
+		Where("id = ? AND deleted_time IS NULL", id).
+		Updates(map[string]interface{}{
+			"deleted_time": t,
+			"deleted_by":   deletedBy,
+			"generation":   gorm.Expr("generation + 1"),
+		})
+	if result.Error != nil {
+		db.MarkForRollback(ctx, result.Error)
+		return nil, result.Error
+	}
+	return d.Get(ctx, id)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/dao/node_pool.go` around lines 81 - 103, The RequestDeletion
implementation in sqlNodePoolDao currently calls Save(nodePool) which can
overwrite concurrent changes; instead, after fetching the node via d.Get (and
returning if already deleted), perform a targeted atomic update like in
RequestDeletionByOwner: use the session (g2 := (*d.sessionFactory).New(ctx)) and
call g2.Model(&api.NodePool{}).Where("id = ? AND deleted_time IS NULL",
id).Updates(map[string]interface{}{"deleted_time": t, "deleted_by": deletedBy,
"generation": nodePool.Generation + 1}) (or equivalent) and check
RowsAffected/Err to detect a race, calling db.MarkForRollback(ctx, err) and
returning on error; keep the existing variables (t, deletedBy) and return the
updated nodePool only after the successful Updates call.

Comment on lines +20 to 25
CREATE TABLE IF NOT EXISTS node_pools (
id VARCHAR(255) PRIMARY KEY,
created_time TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_time TIMESTAMPTZ NOT NULL DEFAULT NOW(),
deleted_time TIMESTAMPTZ NULL,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don’t rewrite an already-versioned migration in place.

This keeps migration ID 202511111055 but changes the table shape and index semantics. Any environment that already applied the old migration will still have deleted_at and the old unique index, while the new DAO code now writes deleted_time / deleted_by, so DELETE requests will fail after upgrade. Please add a new follow-up migration that renames/adds columns and rebuilds the index instead of editing the historical migration.

Also applies to: 61-63

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/migrations/202511111055_add_node_pools.go` around lines 20 - 25, Don't
modify the historical migration 202511111055; instead create a new migration
that updates the node_pools table to match the DAO changes: add/rename columns
(introduce deleted_time TIMESTAMPTZ NULL and deleted_by VARCHAR(...) or rename
deleted_at to deleted_time as needed), drop and rebuild the old unique index and
recreate the new unique index semantics the DAO expects (e.g., rebuilding the
index that previously referenced deleted_at to reference
deleted_time/deleted_by), and ensure the migration uses ALTER TABLE / CREATE
INDEX statements so environments that already ran 202511111055 get the schema
fixes without changing the original migration file.

Comment on lines +116 to +125
// Already marked for deletion — skip cascade to avoid unnecessary DB roundtrips.
// The DAO-level cascade method guards with WHERE deleted_time IS NULL, so a repeat call
// would be a safe no-op, but we short-circuit here for efficiency.
if alreadyDeleted {
return cluster, nil
}
t := *cluster.DeletedTime

if err := s.nodePoolDao.RequestDeletionByOwner(ctx, id, t); err != nil {
return nil, errors.GeneralError("Unable to cascade deletion to nodepools for cluster %s: %s", id, err)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd pkg/services && head -n 150 cluster.go | tail -n 50

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 1964


🏁 Script executed:

cat -n pkg/services/cluster.go | head -n 130 | tail -n 30

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 1487


🏁 Script executed:

find . -type f -name "*.go" | xargs rg -l "func.*RequestDeletion" | grep -i dao

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 178


🏁 Script executed:

rg -A 20 "func.*\(.*\).*RequestDeletion\(.*ctx.*id" pkg/services/dao/ --type go

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 136


🏁 Script executed:

rg -A 20 "func.*RequestDeletion" pkg/dao/cluster.go --type go

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 771


🏁 Script executed:

rg -A 20 "func.*RequestDeletionByOwner" pkg/dao/node_pool.go --type go

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 838


🏁 Script executed:

rg -B 5 -A 30 "func.*RequestDeletion.*ctx.*context" pkg/dao/cluster.go --type go | head -n 50

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 1149


🏁 Script executed:

rg -B 5 -A 10 "sessionFactory\|BeginTx\|Tx\|transaction" pkg/services/cluster.go --type go

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 61


🏁 Script executed:

cat -n pkg/dao/cluster.go | grep -A 5 -B 5 "sessionFactory"

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 2762


🏁 Script executed:

rg -A 15 "type SessionFactory" pkg/db/ --type go

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 534


🏁 Script executed:

rg -A 10 "func.*New.*ctx" pkg/db/ --type go | head -n 40

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 2239


🏁 Script executed:

rg -A 10 "db_context.Transaction\|BeginTx" pkg/ --type go

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 61


🏁 Script executed:

rg "Transaction\|BeginTx" pkg/services/cluster.go --type go

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 61


🏁 Script executed:

rg "RequestDeletion" pkg/services/ --type go -B 2 -A 5

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 9001


🏁 Script executed:

rg -i "retry\|transaction" pkg/services/cluster.go --type go

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 61


🏁 Script executed:

rg -A 30 "TestRequestDeletion_CascadesToNodePools\|TestRequestDeletion_AlreadyDeleted" pkg/services/cluster_test.go --type go

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 61


🏁 Script executed:

cat -n pkg/services/cluster_test.go | grep -A 30 "TestRequestDeletion_CascadesToNodePools"

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 1134


🏁 Script executed:

cat -n pkg/services/cluster_test.go | grep -A 30 "TestRequestDeletion_AlreadyDeleted"

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 1230


🏁 Script executed:

cat -n pkg/services/cluster_test.go | grep -A 50 "TestRequestDeletion_CascadesToNodePools" | head -n 60

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 1898


Partial cascade failures can leave the system inconsistent.

If clusterDao.RequestDeletion succeeds but nodePoolDao.RequestDeletionByOwner fails, subsequent retries will skip the cascade entirely due to the alreadyDeleted early return at line 120, leaving the cluster marked for deletion while its nodepools are not. Wrap both writes in a transaction to ensure atomicity, or add explicit retry logic for the cascade when the cluster is already marked deleted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/cluster.go` around lines 116 - 125, The current delete flow
returns early when alreadyDeleted is true which can cause cluster deletion and
nodepool cascade to diverge if clusterDao.RequestDeletion succeeded but
s.nodePoolDao.RequestDeletionByOwner failed; modify the logic in the deletion
handler (the block using alreadyDeleted, cluster.DeletedTime,
clusterDao.RequestDeletion and s.nodePoolDao.RequestDeletionByOwner) to perform
both writes atomically by wrapping the cluster delete flag and the nodepool
RequestDeletionByOwner call in a single DB transaction, or alternatively
implement retry/compensating logic that, when encountering alreadyDeleted==true
and a missing nodepool cascade, re-attempts RequestDeletionByOwner (using
cluster.DeletedTime) until success; ensure you remove the unsafe early return or
guard it behind transaction/compensation so cascade cannot be skipped
permanently.

}

// Delete soft-deletes a specific nodepool for a cluster
func (h ClusterNodePoolsHandler) Delete(w http.ResponseWriter, r *http.Request) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from /review-pr this one seems appropriate:

Recommendation 3/7 — Missing ErrorHandler in ClusterNodePoolsHandler.Delete

File: pkg/handlers/cluster_nodepools.go:172
Category: Inconsistency
Severity: Blocking
Confidence: High

Problem:
ClusterHandler.Delete (in pkg/handlers/cluster.go) explicitly sets ErrorHandler: handleError in its handlerConfig. The equivalent handler for node pools (ClusterNodePoolsHandler.Delete) does not set this field — yet both handlers call handleDelete
and return non-trivial errors (not-found, ownership mismatch, presenter errors).

Without ErrorHandler, errors from the Action func will go through whatever the default error handling path is in handleDelete. If the default doesn't call handleError, the node pool DELETE endpoint will produce different (likely incorrect) HTTP error
responses compared to the cluster DELETE endpoint — for example, a 404 might come back as a 500, or the error body format may differ.

The fix is a one-liner:

cfg := &handlerConfig{
Action: func() (interface{}, *errors.ServiceError) {
// ...
},
ErrorHandler: handleError, // add this
}

GitHub comment (ready to copy-paste):

Warning

Blocking

Category: Inconsistency

ClusterHandler.Delete sets ErrorHandler: handleError in its config, but ClusterNodePoolsHandler.Delete doesn't:

// pkg/handlers/cluster.go — has it
cfg := &handlerConfig{
    Action:       func() (interface{}, *errors.ServiceError) { ... },
    ErrorHandler: handleError,
}

// pkg/handlers/cluster_nodepools.go — missing it
cfg := &handlerConfig{
    Action: func() (interface{}, *errors.ServiceError) { ... },
    // no ErrorHandler
}

Without ErrorHandler: handleError, errors from the nodepool Delete action (not-found, ownership mismatch, presenter failures) may go through a different code path in handleDelete and produce inconsistent HTTP responses (wrong status code, wrong
error body format). One-liner fix: add ErrorHandler: handleError to the nodepool Delete config.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, the Claude own /review command says that:

  1. ClusterHandler.Delete has redundant explicit ErrorHandler: handleError

handleDelete in framework.go:89-91 already defaults cfg.ErrorHandler to handleError when nil. Setting it explicitly is
harmless but inconsistent with every other handler. Remove it.


// Set deleted_time, deleted_by, and increment generation to trigger Sentinel reconciliation.
t := time.Now()
deletedBy := "system@hyperfleet.local"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

Blocking

Category: Bug

deleted_by is hardcoded to "system@hyperfleet.local" here and in node_pool.go:95/node_pool.go:130, but created_by and updated_by use the authenticated user's identity from the JWT context. This makes the audit trail for deletions useless — you can never tell who actually requested the deletion.

The fix is to accept deletedBy string as a parameter from the handler layer (where auth.GetAuthPayload(r) is available) and thread it through the service into the DAO, the same way created_by is handled during creation:

// DAO signature
RequestDeletion(ctx context.Context, id string, deletedBy string) (*api.Cluster, bool, error)

// Service signature
RequestDeletion(ctx context.Context, id string, deletedBy string) (*api.Cluster, *errors.ServiceError)

// Handler
user := auth.GetAuthPayload(r).Email  // or however the identity is extracted
cluster, err := h.cluster.RequestDeletion(ctx, id, user)

Also applies to: pkg/dao/node_pool.go:95, pkg/dao/node_pool.go:130

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are doing the same for created_by updated_by and we are not yet using authz

So I would say that we discard this as a bug, and will set to the correct value when we decide how to get the value from ROSA/GCP teams.

"nodepool deleted_time should not change on repeated delete")
Expect(npAfterSecond.Generation).To(Equal(npGenerationAfterFirst),
"nodepool generation should not be incremented on repeated delete")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

Blocking

Category: Bug

The new DELETE /clusters/{id}/nodepools/{nodepool_id} endpoint has zero integration tests, while the cluster DELETE has three. The nodepool delete handler includes an important ownership verification guard (nodePool.OwnerID != clusterID → 404) that is completely untested.

Missing test scenarios:

  1. Success — DELETE returns 202 with deleted_time/deleted_by set
  2. Ownership mismatch — DELETE with a valid nodepool ID but wrong cluster ID returns 404
  3. Idempotent — repeated DELETE on same nodepool returns 202 with unchanged deleted_time
  4. Not found — DELETE with nonexistent nodepool ID returns 404

These are required per the JIRA acceptance criteria: "Integration tests verify success ... cascade scenarios".

t := *cluster.DeletedTime

if err := s.nodePoolDao.RequestDeletionByOwner(ctx, id, t); err != nil {
return nil, errors.GeneralError("Unable to cascade deletion to nodepools for cluster %s: %s", id, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

Blocking

Category: JIRA

HYPERFLEET-543 scope and acceptance criteria both state:

  • "Cluster DELETE cascades deleted_at to all child nodepools and their adapter_statuses"
  • "NodePool DELETE cascades deleted_at to its adapter_statuses only"

The cascade currently only reaches nodepools — adapter_statuses are not touched. The struck-through acceptance criterion is about "no orphaned adapter_statuses" (a separate concern about eventual cleanup), while the cascade requirement itself is not struck through.

If adapter_statuses cascade was explicitly deferred during the HYPERFLEET-560 spike, please document that decision in the PR description so it's clear this is intentional scope reduction rather than a gap.

t := *cluster.DeletedTime

if err := s.nodePoolDao.RequestDeletionByOwner(ctx, id, t); err != nil {
return nil, errors.GeneralError("Unable to cascade deletion to nodepools for cluster %s: %s", id, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

nit — non-blocking suggestion

Category: Pattern

This cascade error path uses raw errors.GeneralError(...) while the cluster's own deletion error (line 110) routes through handleRequestDeletionError which checks for db.IsDBConnectionError and returns a proper 503. If the DB goes down mid-cascade, clients will get a 500 with the raw driver error instead of a clean 503.

Consider routing through the same helper:

if err := s.nodePoolDao.RequestDeletionByOwner(ctx, id, t); err != nil {
    return nil, handleRequestDeletionError("NodePool cascade", err)
}

return nil
}

func (d *sqlNodePoolDao) FindByOwnerID(ctx context.Context, ownerID string) (api.NodePoolList, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

nit — non-blocking suggestion

Category: Improvement

FindByOwnerID is added to the NodePoolDao interface, implemented, and stubbed in both mocks, but never called from any service or handler. The cascade uses RequestDeletionByOwner (bulk UPDATE) directly, so this method has no callers. Consider removing it to keep the interface minimal — it can always be added back when there's an actual use case.

ctx := context.Background()
clusterID := "already-deleted"

DeletedTime := time.Now().Add(-time.Hour)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

nit — non-blocking suggestion

Category: Pattern

Local variable DeletedTime should be deletedTime — uppercase first letter in Go signals an exported identifier, which doesn't apply to function-scoped variables.

@rh-amarin rh-amarin changed the title Hyperfleet 543 - feat: Implement Cluster and NodePool DELETE endpoints with cascade logic HYPERFLEET-543 - feat: Implement Cluster and NodePool DELETE endpoints with cascade logic Apr 14, 2026
Copy link
Copy Markdown
Contributor

@ciaranRoche ciaranRoche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hitting the request changes here, we need to ensure we do not modify existing migrations, this is a bad pattern to be in. I know people have done this in the past, but lets start from now to move forward as we mean to go on 🙏

created_time TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_time TIMESTAMPTZ NOT NULL DEFAULT NOW(),
deleted_at TIMESTAMPTZ NULL,
CREATE TABLE IF NOT EXISTS adapter_statuses (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see others have done this, this is not great practice. We should not be modifying migrations after the fact. These wont be run in any existing deployment. Our partner teams would need to blow away their DB in order to get these migrations to work.

Please include a new migration for the changes you want to add here.

// Create unique index on resource_type, resource_id, and adapter
// This ensures one adapter status per resource per adapter
createIdxUniqueSQL := "CREATE UNIQUE INDEX IF NOT EXISTS idx_adapter_statuses_unique " +
"ON adapter_statuses(resource_type, resource_id, adapter) WHERE deleted_at IS NULL;"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation to drop the where clause on this index?

Correct me if I am wrong, but this means we block creating new resources with same name even if one is soft deleted. Is this by design or decision?

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 14, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ciaranroche. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/services/cluster_test.go (1)

1217-1290: Broaden the cascade tests to catch over-delete and retry bugs.

These cases only prove that owned nodepools can end up deleted. They still pass if RequestDeletion deletes unrelated pools, or if it stops cascading once the cluster is already soft-deleted and a previous attempt only partially finished.

🧪 Suggested test additions
 func TestRequestDeletion_CascadesToNodePools(t *testing.T) {
@@
 	nodePoolDao.nodePools["np-2"] = &api.NodePool{
 		Meta:       api.Meta{ID: "np-2"},
 		OwnerID:    clusterID,
 		Generation: 1,
 	}
+	nodePoolDao.nodePools["np-other"] = &api.NodePool{
+		Meta:       api.Meta{ID: "np-other"},
+		OwnerID:    "another-cluster",
+		Generation: 1,
+	}
@@
 	g.Expect(nodePoolDao.nodePools["np-1"].DeletedTime).ToNot(BeNil())
 	g.Expect(nodePoolDao.nodePools["np-2"].DeletedTime).ToNot(BeNil())
+	g.Expect(nodePoolDao.nodePools["np-other"].DeletedTime).To(BeNil())
 }
 
 func TestRequestDeletion_AlreadyDeleted_IdempotentCascade(t *testing.T) {
@@
-	// Nodepool already soft-deleted from the previous cascade
+	// Simulate a partially completed previous cascade: the cluster is already
+	// soft-deleted, but one owned nodepool is still active.
 	nodePoolDao.nodePools["np-1"] = &api.NodePool{
-		Meta:        api.Meta{ID: "np-1"},
-		DeletedTime: &DeletedTime,
-		OwnerID:     clusterID,
-		Generation:  2,
+		Meta:       api.Meta{ID: "np-1"},
+		OwnerID:    clusterID,
+		Generation: 2,
 	}
@@
-	// Already-deleted nodepool retains its original DeletedTime
-	g.Expect(nodePoolDao.nodePools["np-1"].DeletedTime.Equal(DeletedTime)).To(BeTrue())
+	g.Expect(nodePoolDao.nodePools["np-1"].DeletedTime).ToNot(BeNil(), "retry should finish cascading to remaining nodepools")
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/services/cluster_test.go` around lines 1217 - 1290, Extend the existing
cascade tests to assert that RequestDeletion only marks nodepools owned by the
target cluster (check nodePoolDao.nodePools entries with OwnerID == clusterID
are deleted and those with different OwnerID remain unchanged) and add a
retry/partial-cascade scenario: simulate one nodepool already soft-deleted and
another owned one not deleted, then call service.RequestDeletion again and
assert the remaining owned nodepool gets DeletedTime while unrelated pools still
do not; reference TestRequestDeletion_CascadesToNodePools,
TestRequestDeletion_AlreadyDeleted_IdempotentCascade, and the
service.RequestDeletion, nodePoolDao.nodePools and
api.NodePool.OwnerID/DeletedTime fields when adding these assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/services/cluster_test.go`:
- Around line 1217-1290: Extend the existing cascade tests to assert that
RequestDeletion only marks nodepools owned by the target cluster (check
nodePoolDao.nodePools entries with OwnerID == clusterID are deleted and those
with different OwnerID remain unchanged) and add a retry/partial-cascade
scenario: simulate one nodepool already soft-deleted and another owned one not
deleted, then call service.RequestDeletion again and assert the remaining owned
nodepool gets DeletedTime while unrelated pools still do not; reference
TestRequestDeletion_CascadesToNodePools,
TestRequestDeletion_AlreadyDeleted_IdempotentCascade, and the
service.RequestDeletion, nodePoolDao.nodePools and
api.NodePool.OwnerID/DeletedTime fields when adding these assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c4561b73-d98f-455c-b3dd-4cbdc3747a3f

📥 Commits

Reviewing files that changed from the base of the PR and between 5c89c2a and 7b2e584.

📒 Files selected for processing (7)
  • CHANGELOG.md
  • pkg/api/metadata_types.go
  • pkg/dao/cluster.go
  • pkg/dao/node_pool.go
  • pkg/middleware/otel_test.go
  • pkg/services/cluster_test.go
  • pkg/services/node_pool_test.go
💤 Files with no reviewable changes (1)
  • pkg/middleware/otel_test.go
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/api/metadata_types.go
  • pkg/dao/node_pool.go
  • pkg/dao/cluster.go
  • pkg/services/node_pool_test.go

return nil, errors.NotImplemented("delete")
id := mux.Vars(r)["id"]
ctx := r.Context()
cluster, err := h.cluster.RequestDeletion(ctx, id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can validate here that the cluster has already been marked as deleted, avoiding spreading that logic across the service and DAO. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants