-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random test suite crash #514
Comments
Oh yeah, I ran into it once as well. It passed the next time I ran it, though. Maybe, like it said, the temp filename was coincidently the same as last time the test was done, and the file isn't cleared? |
No, they're checked for collisions. Not saying it's because of your changes, I just didn't see it before. |
Been a while since I've seen this. I'd start with checking to ensure any test that mutates an on-disk fixture is using the |
Hmm, I was able to reproduce the crash on However, running |
Caught it with |
Great tip! Am I right that both |
They should be dynamically linked. I think SQLite is pretty robust, at least in some build configurations (unlike.. ahem.. netcdf). And I couldn't trigger it when filtering by |
I still had gdb running, so I checked the other threads:
So indeed, it looks like proj is also calling into SQLite at the same time, not that it proves anything. EDIT: another one, this time with debug symbols:
The crash is at:
So https://github.com/OSGeo/gdal/blob/6133cf34a78077998406c0c4045bf51f06e3f49d/ogr/ogrsf_frmts/sqlite/ogrsqlitevirtualogr.cpp#L2611 looks just about right.
However,
So it looks like the |
CC @rouault I don't know if I can reproduce this in pure GDAL. It might be a race on our side between I managed to reproduce this using 3 tests:
Anyway, taking a closer look at the assembly, That seems to be initialized through But the stack trace has:
So I don't really understand how this is happening, but it's probably caused by that |
Wow, nice work, and nice case study in using |
https://sqlite.org/src/file?name=src/sqlite3ext.h #define sqlite3_create_module_v2 sqlite3_api->create_module_v2 Right, well that explains the indirect call in the asm 🤦♂️. |
@lnicola I've tried to reproduce on my system using GDAL master and SQLite 3.31.4 (ubuntu 20.04), but What could help perhaps is if you could add the following debug trace to your GDAL build:
|
Will try, I have to build it first. It's not too easy to reproduce, I was continuously running that in a loop, over 10 to 20 instances, and it still took a couple of minutes. In any case, in my crash it's |
Extermely weird. I don't see how that happen. Even if OGR2SQLITE_static_register() was called with _pApi == nullptr (which can only happen if SQLite is built with SQLITE_OMIT_LOAD_EXTENSION), then the following will assign default pointers:
And sqlite3_api is declared static at line 144 ( |
We only call |
The other possibility is that it's racing against libproj's usage of SQLite. Though it seems less likely (I didn't check, but I assume it's not setting itself up as an extension or anything weird like that). |
I added that debug-print (with the thread id too) and the result was not what I expected: Successful run:
Failed run:
(tested on GDAL 3.7.3 with SQLite 3.42.0) To reproduce:
(for POSIX shells that should be |
With more logging:
So, without having looked in detail at the rest of the code:
|
OGR2SQLITE_Register is only called by OGRSQLiteBaseDataSource::OpenOrCreateDB(), not at GDALAllRegister() time |
Of course, not sure what I was seeing. With more logging
|
I still don't manage to reproduce within a fedora:39 Docker image, even while looping for ever. My hypothesis to explain what you observe would be:
If you have the possibility of building sqlite from source, one possibility would be to try to remove the optim https://github.com/sqlite/sqlite/blob/89efa897780db03eac974eb6b0e041cfb7c39733/src/loadext.c#L873 and see if that makes any difference |
OSGeo/gdal#9068 should likely stop the segfault, but there's still something missing to understand the full picture |
I don't know if I can add something valuable to the dscussion, but we've recently added Looking at this thread I noticed that |
Just ran into this on #510 (but can't reproduce it easily):
That looks like a temporary file name, not sure what's going on.
The text was updated successfully, but these errors were encountered: