Skip to content

fix: Add legacy mode handling to cast Decimal to String#3939

Open
parthchandra wants to merge 10 commits intoapache:mainfrom
parthchandra:cast-primitive-string
Open

fix: Add legacy mode handling to cast Decimal to String#3939
parthchandra wants to merge 10 commits intoapache:mainfrom
parthchandra:cast-primitive-string

Conversation

@parthchandra
Copy link
Copy Markdown
Contributor

@parthchandra parthchandra commented Apr 13, 2026

Which issue does this PR close?

Part of #286
Fixes: #1068

Rationale for this change

CometCast.canCastToString had previously marked DecimalType -> StringType as Compatible with a caveat. However, the native Decimal128 -> String cast was not explicitly handled in cast_array and it was falling through to DataFusion's built-in cast, which was incompatible with Spark legacy mode for large scale values.

For example, zero with a large scale: Decimal(38,18) zero exponent = -18 which produced "0E-18" in Spark LEGACY but "0.000000000000000000" from Comet

TRY and ANSI modes are compatible

What changes are included in this PR?

Adds explicit handling for legacy mode

How are these changes tested?

unit tests

@parthchandra parthchandra force-pushed the cast-primitive-string branch from 6f790e0 to 8acb912 Compare April 13, 2026 22:36
@parthchandra parthchandra force-pushed the cast-primitive-string branch from 569d741 to f570e55 Compare April 13, 2026 23:01
@parthchandra parthchandra marked this pull request as draft April 14, 2026 21:57
@parthchandra parthchandra marked this pull request as ready for review April 15, 2026 16:28
@parthchandra
Copy link
Copy Markdown
Contributor Author

@andygrove @kazuyukitanimura this is ready for review

@parthchandra parthchandra force-pushed the cast-primitive-string branch from e6ae16b to 891dd70 Compare April 15, 2026 22:03
let scale_u = scale as usize;
let num_digits_u = num_digits as usize;
if scale_u == 0 {
format!("{sign}{coeff}")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker, but this is doing a new string allocation per row. It could be more efficient to have a re-usable buffer to reduce allocations.

val allowNegativeScale = SQLConf.get
.getConfString("spark.sql.legacy.allowNegativeScaleOfDecimal", "false")
.toBoolean
if (allowNegativeScale) Compatible() else Incompatible()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Comet now compatible across all eval modes?

}
}

test("cast TimestampType to StringType - ancient timestamps") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these tests seem unrelated to casting decimal to string

// CORRECTED mode writes timestamps as proleptic Gregorian without rebase.
// Required because generateTimestamps() includes pre-1900 values (e.g. 1900-06-15)
// which trigger INT96's default EXCEPTION mode when written with certain timezones.
withSQLConf("spark.sql.parquet.int96RebaseModeInWrite" -> "CORRECTED") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to the PR goal, may be better in a separate PR

@parthchandra parthchandra changed the title fix: Add legacy mode handling to cast Decimal to String fix: correct cast of all primitive types and decimal to StringType Apr 17, 2026
@parthchandra parthchandra changed the title fix: correct cast of all primitive types and decimal to StringType fix: Add legacy mode handling to cast Decimal to String Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make decimal to string cast fully compatible with Spark

2 participants