Conversation
Makes it easier to reuse for other code
Mimicks the parallel case, to write FCI independent code
The boundary region does not match what is done for the parallel case, thus porting it to a iterator does not work.
hasBndryUpperY / hasBndryLowerY does not work for FCI and thus the request does not make sense / can be configured to throw. Thus it should not be checked if it is not needed.
BoundaryRegionIter has been delted
This allows to extend the boundary code to place the boundary further away from the boundary.
This is more general and takes the offset() into account, and thus works for cases where the boundary is between the first and second guard cell
Previously only the first boundary point was set
Taken from hermes-3, adopted for higher order
This allows to write code for FCI and non-FCI using templates.
Directly iterate over the points
| std::shared_ptr<YBoundary> getYBoundary(Coordinates* coords, | ||
| YBndryType type = YBndryType::sheath) { | ||
| auto itype = static_cast<int>(type); | ||
| if (coords->ybndrys[itype] == nullptr) { |
There was a problem hiding this comment.
warning: do not use array subscript when the index is not an integer constant expression [cppcoreguidelines-pro-bounds-constant-array-index]
if (coords->ybndrys[itype] == nullptr) {
^| YBndryType type = YBndryType::sheath) { | ||
| auto itype = static_cast<int>(type); | ||
| if (coords->ybndrys[itype] == nullptr) { | ||
| coords->ybndrys[itype] = coords->makeYBoundary(type); |
There was a problem hiding this comment.
warning: do not use array subscript when the index is not an integer constant expression [cppcoreguidelines-pro-bounds-constant-array-index]
coords->ybndrys[itype] = coords->makeYBoundary(type);
^| if (coords->ybndrys[itype] == nullptr) { | ||
| coords->ybndrys[itype] = coords->makeYBoundary(type); | ||
| } | ||
| return coords->ybndrys[itype]; |
There was a problem hiding this comment.
warning: do not use array subscript when the index is not an integer constant expression [cppcoreguidelines-pro-bounds-constant-array-index]
return coords->ybndrys[itype];
^Thanks to Mike Kryjak for noticing.
ZedThree
left a comment
There was a problem hiding this comment.
I've not finished going through this, but it looks like it's almost there. The thing that would make this really nice is to template the new boundary region stuff so we can handle X/Y boundaries without any runtime switches or checking bx/by etc.
There's lots of clang-tidy comments that should be fixed as well.
And more docstrings and explanatory comments please!
| #include <bout/assert.hxx> | ||
| #include <bout/bout_types.hxx> | ||
|
|
||
| namespace parallel_stencil { |
There was a problem hiding this comment.
| namespace parallel_stencil { | |
| namespace bout::parallel_stencil { |
| inline BoutReal pow(BoutReal val, int exp) { | ||
| // constexpr int expval = exp; | ||
| // static_assert(expval == 2 or expval == 3, "This pow is only for exponent 2 or 3"); | ||
| if (exp == 2) { | ||
| return val * val; | ||
| } | ||
| ASSERT3(exp == 3); | ||
| return val * val * val; | ||
| } |
There was a problem hiding this comment.
It doesn't look like we call this with anything other than 2, and from what I can tell, this isn't actually faster than std::pow, so let's just use that?
There was a problem hiding this comment.
I tested it with gcc 15.2.1:
g++ example.cxx -std=c++20 -ffast-math -O2 -march=native -isystem include -Lbuild/src -lbenchmark -lpthread -o mybenchmark && ./mybenchmark 2>&1 | tail -n2
BM_StdPow 0.234 ns 0.234 ns 3000470135
BM_multi 0.444 ns 0.444 ns 1573472940
[dave@raven03: benchmark]👍 g++ example.cxx -std=c++20 -O2 -march=native -isystem include -Lbuild/src -lbenchmark -lpthread -o mybenchmark && ./mybenchmark 2>&1 | tail -n2
BM_StdPow 20.5 ns 20.5 ns 34629261
BM_multi 0.441 ns 0.441 ns 1600645212
So std::pow is faster, but only with -ffast-math.
There was a problem hiding this comment.
Correction: with -ffast-math they are the same speed, otherwise std::pow is a factor 100 slower.
| bndry->setValid(0); | ||
| for (auto pnt : *bndry) { | ||
| for (auto pnt2 : *bndry2) { | ||
| #warning this could likely be done faster |
There was a problem hiding this comment.
I think this is zip? Unfortunately, that's C++23, but we could add an implementation if necessary. Either way, we don't want to merge this with the warning!
| BoutReal& ynext(Field3D& f) const { return f[ind().yp(by).xp(bx)]; } | ||
| const BoutReal& ynext(const Field3D& f) const { return f[ind().yp(by).xp(bx)]; } | ||
| BoutReal& yprev(Field3D& f) const { return f[ind().yp(-by).xp(-bx)]; } | ||
| const BoutReal& yprev(const Field3D& f) const { return f[ind().yp(-by).xp(-bx)]; } | ||
| BoutReal& ythis(Field3D& f) const { return f[ind()]; } |
There was a problem hiding this comment.
I'm a bit confused by these functions, they start with y, but the implementation has .xp in there, so I guess these are supposed to be generic for X and Y boundaries?
There was a problem hiding this comment.
They are only used for y, but could also be used for x.
| const BoutReal& yprev(const Field2D& f) const { return f[ind().yp(-by).xp(-bx)]; } | ||
| #endif | ||
|
|
||
| const int dir; |
There was a problem hiding this comment.
Please sort out all these clang-tidy comments, basically just add getters:
public:
int dir() { return dir_; }
private:
int dir_;plus docstrings
| // check that contravariant tensors are positive (if expected) and finite (always) | ||
| void checkContravariant(); | ||
|
|
||
| mutable std::array<std::shared_ptr<YBoundary>, 3> ybndrys; |
There was a problem hiding this comment.
Where does the 3 come from? How do we check that it is still the correct value?
| return bndry_position != rhs.bndry_position; | ||
| } | ||
|
|
||
| #define ITER() for (int i = 0; i < localmesh->ystart - abs_offset(); ++i) |
There was a problem hiding this comment.
We don't really need a macro for this
| #include <algorithm> | ||
| #include <functional> | ||
|
|
||
| class BoundaryRegionIter { |
There was a problem hiding this comment.
I guess ideally this would be templated on the direction (X or Y)
There was a problem hiding this comment.
Does it need to be templated? It could also just be generic without templating?
Are you worried about performance, or what is your motivation for templating?
This allows to implement unified BCs for yup/ydown as well as FCI / FA.
Taking into acount things like whether the boundary is next to the domain, or between the parallel slices.