Add `Dataset::maybe_batch` for faster database write #591

Atreyagaurav · 2024-11-27T21:55:13Z

I agree to follow the project's code of conduct.
I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

Superseeds #584

Atreyagaurav · 2024-11-27T23:14:27Z

@lnicola Sorry for the new pull request. This should be ready now.

lnicola · 2024-11-28T06:28:21Z

Come to think of it, the API I'd really want looks like:

// this will keep track of how many written/updated features there were
let mut tx = ds.batching_transaction(BATCH_SIZE)?;
for row in df {
    write_feature(&tx, row)?;
}
tx.commit()?;

But I guess that's probably too opinionated.

weiznich · 2024-11-28T10:52:37Z

@lnicola let me put the response to #584 (comment) here:

The main reason I prefer the closure based API instead of the guard object based API is that the former makes it much clearer to handle error cases.

With

let mut tx = ds.batching_transaction(BATCH_SIZE)?;
for row in df {
    write_feature(&tx, row)?;
}
tx.commit()?;

you can easily end up in a situation where write_feature returns an error and ? bubbles up that error or write_feature just panics. Now you are in a situation where you have a still open transaction that needs to be closed. The common way to do that is to implement a destructor that will close the transaction. This has the rather large problem that you now have no way to communicate to the user if the rollback was actually successful or not as the rollback runs outside their control. In my experience that's the way you end up with dangling transactions in long running applications, which is something that's really hard to debug.

weiznich · 2024-11-28T10:51:36Z

src/vector/transaction.rs

+        let force = 0; // since this is for speed
+        let rv = unsafe { gdal_sys::GDALDatasetStartTransaction(self.c_dataset(), force) };
+        let res = func(self);
+        if rv == OGRErr::OGRERR_NONE {
+            let rv = unsafe { gdal_sys::GDALDatasetCommitTransaction(self.c_dataset()) };
+            if rv != OGRErr::OGRERR_NONE {
+                return Err(GdalError::OgrError {
+                    err: rv,
+                    method_name: "GDALDatasetCommitTransaction",
+                });
+            }
+        }


I don't think the implementation is correct yet.

Something like this is required to handle at least basic errors.

Suggested change

let force = 0; // since this is for speed

let rv = unsafe { gdal_sys::GDALDatasetStartTransaction(self.c_dataset(), force) };

let res = func(self);

if rv == OGRErr::OGRERR_NONE {

let rv = unsafe { gdal_sys::GDALDatasetCommitTransaction(self.c_dataset()) };

if rv != OGRErr::OGRERR_NONE {

return Err(GdalError::OgrError {

err: rv,

method_name: "GDALDatasetCommitTransaction",

});

}

}

let force = 0; // since this is for speed

let rv = unsafe { gdal_sys::GDALDatasetStartTransaction(self.c_dataset(), force) };

let res = func(self);

if rv == OGRErr::OGRERR_NONE {

// check for the result of the write process of the user,

// if it's an error we want to rollback, otherwise commit

match &res {

Ok(_) => {

let rv = unsafe { gdal_sys::GDALDatasetCommitTransaction(self.c_dataset()) };

if rv != OGRErr::OGRERR_NONE {

// if commit returns an error, at least try to rollback

// to keep the dataset in a usable state

let rv2 = unsafe { gdal_sys::GDALDatasetRollbackTransaction(self.c_dataset()) };

if rv2 != OGRErr::OGRERR_NONE {

// if rollback also returns an error we

// want to return the rollback error

return Err(GdalError::OgrError {

err: rv2,

method_name: "GDALDatasetRollbackTransaction",

});

} else {

// otherwise return the commit error

return Err(GdalError::OgrError {

err: rv,

method_name: "GDALDatasetCommitTransaction",

});

}

}

// if the user returned an error, rollback the changes

Err(_) => {

let rv = unsafe { gdal_sys::GDALDatasetRollbackTransaction(self.c_dataset()) };

if rv != OGRErr::OGRERR_NONE {

return Err(GdalError::OgrError {

err: rv,

method_name: "GDALDatasetRollbackTransaction",

});

}

}

res

}

There shouldn't be a rollback in this transaction. The reason I'm making this is to have a uniform API for drivers that support transactions and those that do not.

So for GPKG it'll use transactions to speed up writing, but for json, shp it will not. The name 'maybe_transaction' might be a bit misleading, but it means it maybe using a transaction to speed up things.

And since we might have any sort of drivers, rolling back changes should not be done, as it can't be done in all of them. If there's commit problem then it should error out and dev should consider exiting it with partially written database.

Atreyagaurav · 2024-11-28T14:04:58Z

@weiznich I think there might have been misunderstanding about the function of the maybe_transaction. Your suggestion is something that should be implemented instead of the normal Transaction that we have. So we'd call the closure and then act based on it failed or not. But that will probably be a different pull request + breaking changes.

I want to clarify the added function is for speeding up the db writing when it can (using transaction), the reason we are not forcing a transaction is that we don't want the rollback feature for uniformity between database that succeeded on transaction process and those that do not, either way we should write to the database.

So the algorithm is:

Try to start transaction
write to database
if transaction was successful, close it
return the error of the database write

Step 3 is there so that there wouldn't be dangling transaction even if database write fails. But since we don't want rollback, we don't need to check for error/success of writing to database.

And there is no rollback on step 3 or 4, is because if the database didn't support transaction at all, then we couldn't do it anyway. So we can only commit the database for uniformity. Commit failing is a critical error, so maybe we should add more documentation explaining how that will break the uniformity.

@lnicola Yeah, the batch size part seems like something more advanced than needed for now. It might be nice if we could have the ones not supporting transactions also use the batch size somehow. But without that, I think the closure itself is sufficient... unless there is a limit (that people are likely to hit) to the number of transaction you can have without commit.

lnicola · 2024-11-28T14:15:13Z

On the subject of naming, I think I like something with Batch more. The transaction is an implementation detail, and it may or may not exist. But the whole point of this feature is making batching easier.

Regarding the implementation:

I'm not convinced we should call RollbackTransaction after a failed commit, but I've asked on the GDAL mailing list
if CommitTransaction fails and we call RollbackTransaction, even if it fails, I think we should return the original error (that we got on commit)

Atreyagaurav · 2024-11-28T14:19:34Z

I agree on the implementation points.

About the name, do you think it's ok to name it batch_operation or something, but without the batch_size parameter? If we are adding the parameter, it'd be nice if there was a way to make it work on non-transaction as well, but I guess that might be hard/not possible.

Atreyagaurav · 2024-11-28T15:14:34Z

Same issue with memory leak here. I'm away from keyboard for next 2 hrs so can't squash commits. Please squash commits if you merge.

lnicola · 2024-11-28T15:33:37Z

I'm not convinced we should call RollbackTransaction after a failed commit, but I've asked on the GDAL mailing list

Okay, it looks like we should roll back. Even says that rollback is required in Postgres, which doesn't match my testing (a failing commit seems to abort the transaction), but should be safe enough. Executing rollback outside a transaction warns, but I don't know if GDAL will make some noise about that warning (we'll have to test).

Atreyagaurav · 2024-11-28T17:41:38Z

Hmm... We could roll back when a transaction commit fails, but that would break the uniformity. I think we might still have to send error, maybe we need to make it respond with Result<Result<_>> one for transaction commit and another for closure call.

Something like:

GPKG -> transaction on -> closure call (res) -> commit (res2) -> return res2.map(|| res)
JSON -> transaction fail -> closure call (res) -> return Ok(res)

That way lib user can know when transaction was attempted and failed, vs when it wasn't and failed.

EDIT: If we rollback it has to be just before the return res2.map(|| res) part, but if rollback also fails, then it get's compicated.

Atreyagaurav · 2024-11-28T18:51:39Z

Sorry if the previous comment was confusing,

What I mean is suppose we have something like this:

ds.maybe_batch(|d| {
  d.do_sth(....) // succeeds
  d.do_sth_else(...) // fails
})

Now, if we have JSON dataset, then it'll already have changes from first task, but not the second task. We can't rollback it.
But if we have GPKG, my current implementation will commit even when the closure has error, which means it'll have changes from first task but not the second.

Basically having uniformity on both dataset. And if start transaction fails in database that supports it, we are fine, we'll just assume it doesn't and continue. So I don't think there is no need to propagate that error.

But if we rollback when the function fails, then the two cases will have different things in the database.

That's why I was talking about not rolling back. But when the closure runs successfully, but the commit fails, is a situation where once again we will have different results from both side, but we can't help that, but wanting to distinguish it from closure failing, I was talking about maybe returning Result<Result<_>> so the user can handle both case accordingly.

lnicola · 2024-11-28T19:01:29Z

But if we have GPKG, my current implementation will commit even when the closure has error, which means it'll have changes from first task but not the second.

I would expect it to roll back here. I see these as a "best-effort transaction". It will get committed at the end because it doesn't make sense otherwise, if one operation fails, then we try to rollback if possible, otherwise tough luck.

Generally, you can't commit after a failed operation:

postgres=# begin;
BEGIN
Time: 0,126 ms
postgres=# insert into foo values(1);
INSERT 0 1
Time: 0,493 ms
postgres=# insert into foo values(1);
ERROR:  23505: duplicate key value violates unique constraint "foo_uq_id"
DETAIL:  Key (id)=(1) already exists.
SCHEMA NAME:  public
TABLE NAME:  foo
CONSTRAINT NAME:  foo_uq_id
LOCATION:  _bt_check_unique, nbtinsert.c:666
Time: 0,436 ms
postgres=# commit;
ROLLBACK
Time: 0,078 ms
postgres=# select count(*) from foo;
 count 
-------
     0
(1 row)

Time: 4,055 ms

See above: even though I ran commit, it was a rollback. So you can't commit after an error.

Atreyagaurav · 2024-11-28T19:02:14Z

Moving this from #585

If the driver supports transactions (like GPKG) and StartTransaction fails, I think it's fine to propagate that error to the user. It means something is really wrong and we can't do much about it.

I don't know if there is a case where GPKG will fail to start transaction, but you can still normally write to the dataset. But if there is, then the current implementation will write normally. But if we propagate that error, then we'll error out when we don't need to.

Again, I don't know if that can ever happen. So if you think that's not something we should think about, then I can change this function to propagate the all transaction related errors, and the error from closure.

We can use the Dataset::has_capability here, but I think start transaction also checks for that capability internally, so just assuming it doesn't support on failure seems reasonable to me.

Atreyagaurav · 2024-11-28T19:05:34Z

Ooo, thank you for that example. I'll update the code to include rollback in case the closure fails, and propagate that. Should I wait until #585 is merged so that I don't have to rebase again?

lnicola · 2024-11-28T19:05:37Z

But if we propagate that error, then we'll error out when we don't need to.

Really, if starting a transaction is supported, but fails, that's a really bad sign, we should give up.

We can use the Dataset::has_capability here, but I think start transaction also checks for that capability internally, so just assuming it doesn't support on failure seems reasonable to me.

I guess it's not what you're saying, but we should check like this instead:

If starting the transaction fails, will return OGRERR_FAILURE. Datasources which do not support transactions will always return OGRERR_UNSUPPORTED_OPERATION.

lnicola · 2024-11-28T19:11:38Z

Should I wait until #585 is merged so that I don't have to rebase again?

I'll merge #585 soon (I have two more nits, sorry for not noticing earlier), but given OGRERR_UNSUPPORTED_OPERATION I don't think it's actually required here.

Atreyagaurav · 2024-11-28T19:20:53Z

This will be what I'll commit once it's merged.

    /// Optionally start a transaction before running `func` for performance
    ///
    /// This uses transaction if the dataset supports it, otherwise it
    /// runs the `func` function as it is on the dataset. If the
    /// closure results in error, and it is a transaction, it is
    /// rolled back.
    pub fn maybe_batch(&mut self, func: impl Fn(&Dataset) -> Result<()>) -> Result<()> {
        let force = 0; // since this is for speed
        let rv = unsafe { gdal_sys::GDALDatasetStartTransaction(self.c_dataset(), force) };
        if rv == OGRErr::OGRERR_UNSUPPORTED_OPERATION {
            func(self)
        } else if rv == OGRErr::OGRERR_NONE {
            let res = func(self);
            if res.is_ok() {
                let rv = unsafe { gdal_sys::GDALDatasetCommitTransaction(self.c_dataset()) };
                if rv != OGRErr::OGRERR_NONE {
                    Err(GdalError::OgrError {
                        err: rv,
                        method_name: "GDALDatasetCommitTransaction",
                    })
                } else {
                    res
                }
            } else {
                let rv = unsafe { gdal_sys::GDALDatasetRollbackTransaction(self.c_dataset()) };
                if rv != OGRErr::OGRERR_NONE {
                    Err(GdalError::OgrError {
                        err: rv,
                        method_name: "GDALDatasetRollbackTransaction",
                    })
                } else {
                    res
                }
            }
        } else {
            // transaction supported but failed
            Err(GdalError::OgrError {
                err: rv,
                method_name: "GDALDatasetStartTransaction",
            })
        }
    }

lnicola · 2024-11-28T19:41:29Z

If func fails, I think we should propagate that error, since it occurred before the rollback.

Atreyagaurav · 2024-11-28T19:42:33Z

I made 3 branches in the conditional: transaction not supported, supported and success and supported and failure.

It propagates function fail error in all cases except transaction supported but failed because we don't call the function at all in that case

See the res which is the return value.

lnicola · 2024-11-28T19:45:09Z

src/vector/transaction.rs

+            } else {
+                let rv = unsafe { gdal_sys::GDALDatasetRollbackTransaction(self.c_dataset()) };
+                if rv != OGRErr::OGRERR_NONE {
+                    Err(GdalError::OgrError {


Here res is Err, but we return the rollback error.

Should we print something to warn about rollback then? I think it's important to show we couldn't rollback. Especially if it means they can't operate on dataset anymore.

Not sure. If you're using this API, I think you know you're eschewing most transactional guarantees, and I'm inlined to say that the first error is the most important one. If your disk is failing or something, that's when it happened.

I'm not sure if I mentioned this, but my use case for this feature is a batch process that writes a new dataset at the end. If it crashes, I'll just restart it. Rolling back after an error is nice, but I'm never going to look at the output.

Ohk, ignored the rollback error. Left a comment there just in case.

lnicola · 2024-11-28T20:07:33Z

Thanks for your work on this. I'll be going now, but I'll try to finish #589 tomorrow, then look some more into this.

weiznich · 2024-11-29T15:49:18Z

src/test_utils.rs

+        .tempfile()
+        .unwrap()
+        .into_parts();
+    file.write_all(&input_data).unwrap();


Couldn't we just use std::fs::copy(path, path_of_the_tempfile)?
That would save us from copying the data first into memory and then back again to the disk.

(Given that this is test code this is likely irrelevant to fix on it's own)

I just copied the function open_gpkg_for_update. But I have modified based on what you said.

Irrelevant, but I was thinking of making the copy_file option for tempfile::Builder so that we can just provide a file, and it'll make a temporary file that is a copy of the file provided using std::fs::copy. That would save a lot of lines of code, but it seems to be external package, maybe a pull request idea.

lnicola · 2024-11-29T16:04:22Z

you can easily end up in a situation where write_feature returns an error and ? bubbles up that error or write_feature just panics. Now you are in a situation where you have a still open transaction that needs to be closed. The common way to do that is to implement a destructor that will close the transaction.

Rollback on drop is what I would expect, yes.

This has the rather large problem that you now have no way to communicate to the user if the rollback was actually successful or not as the rollback runs outside their control.

Indeed, that's a little unfortunate. I think it's similar to how files get closed on drop and you can't be sure if the cached data reached the disk. Even close might fail.

On the other hand, we're looking at an API that's "best-effort transactional". There's no guarantee you get a transaction (you can't even check), so there's no guarantee you can roll back anything. That's why I prefer a name like run_in_batch instead of maybe_transaction.

In my experience that's the way you end up with dangling transactions in long running applications, which is something that's really hard to debug.

I don't follow. I know why dangling transactions are bad, and how keeping the transaction alive for a long time can lead to that. But a failed rollback is not something you'll see very often. They're usually a network error, which the database will detect and handle correctly. If the rollback itself fails, then the database is borked, and will probably require a manual recovery process. You can't really miss that.

But I can see this as a good argument for introducing one or two Error variants, for:

failed rollbacks
failed rollbacks after the user code reported another failure

Which touches on another potential limitation of this API: the callback must return a our error type, unless we choose to box it.

weiznich · 2024-11-29T16:16:45Z

@lnicola First of all I missed that this is "best-effort transactional" API and not something that needs to ensure this. This relaxes the requirements a bit.

I don't follow. I know why dangling transactions are bad, and how keeping the transaction alive for a long time can lead to that. But a failed rollback is not something you'll see very often. They're usually a network error, which the database will detect and handle correctly. If the rollback itself fails, then the database is borked, and will probably require a manual recovery process. You can't really miss that.

That's mostly correct for synchronous rust. For async rust things become more complex… It's just that I spend too much time on designing API's for that…

End the end the relevant difference in this case is just that you can communicate the rollback error more easily.

Atreyagaurav · 2024-11-29T16:27:07Z

But I can see this as a good argument for introducing one or two Error variants,

Instead of variants. Do you think returning Result<Result<_>> instead of Result<_> would be a good idea for this? We can assign one for rollback error and other for commit error + closure error.

Atreyagaurav · 2024-11-29T18:01:56Z

The second commit is for the test utility, so I think it's fine to not squash, right?

lnicola · 2024-11-29T18:04:44Z

Instead of variants. Do you think returning Result<Result<>> instead of Result<> would be a good idea for this? We can assign one for rollback error and other for commit error + closure error.

That just makes my head hurt, sorry 🙃.

Atreyagaurav · 2024-11-29T18:07:45Z

No problems. Just suggesting it because we don't have to make another data structure, but yeah readability is not nice. :D

lnicola

Sorry, I was away the past week. I guess we can postpone the separate rollback error variant, but how do you feel about renaming this to run_in_batch or something similar?

Atreyagaurav · 2024-12-10T18:58:49Z

I renamed it to maybe_batch. Do you want it to be maybe_run_in_batch? Just run_in_batch might be confusing since it doesn't always happen.

lnicola · 2024-12-10T19:03:41Z

CHANGES.md

@@ -24,6 +24,7 @@
  - Add `Defn::geometry_type` ([#562](https://github.com/georust/gdal/pull/562))
  - Add `Defn::field_index` and `Feature::field_index` ([#581](https://github.com/georust/gdal/pull/581))
  - Add `Dataset::has_capability` for dataset capability check ([#581](https://github.com/georust/gdal/pull/585))
+  - Add `Dataset::maybe_batch` which always succeeds to use transaction when possible for speed reasons -- rollbacks if operation fails under transaction ([#584](https://github.com/georust/gdal/pull/584))


Nit:

Suggested change

- Add `Dataset::maybe_batch` which always succeeds to use transaction when possible for speed reasons -- rollbacks if operation fails under transaction ([#584](https://github.com/georust/gdal/pull/584))

- Add `Dataset::maybe_batch` which runs a user function inside a transaction if the dataset supports it, as an optimization for I/O; if the function fails, the transaction may or may might be rolled back ([#584](https://github.com/georust/gdal/pull/584))

may or may not*

lnicola · 2024-12-10T19:07:05Z

src/vector/transaction.rs

+    /// This uses transaction if the dataset supports it, otherwise it
+    /// runs the `func` function as it is on the dataset. If the
+    /// closure results in error, and it is a transaction, it is
+    /// rolled back. If there is error in rollback it is ignored.


Suggested change

/// rolled back. If there is error in rollback it is ignored.

/// rolled back. If the rollback fails, it not reported.

///

/// <div class="warning">This function offers no ACID guarantees and

/// no way to tell if the rollback succeeded or not. Do not use it if you

/// care about your destination dataset.</div>

I didn't check if this renders fine, please try cargo doc --open before amending.

lnicola · 2024-12-10T19:09:20Z

src/vector/transaction.rs

+                }
+            } else {
+                let _ = unsafe { gdal_sys::GDALDatasetRollbackTransaction(self.c_dataset()) };
+                // ignoring rollback error


Suggested change

// ignoring rollback error

// ignore rollback error because it's not guaranteed to be

// supported, and the `func` failure is more important.

lnicola · 2024-12-10T20:00:28Z

r? @jdroenner since this isn't a very clean API.

Atreyagaurav · 2024-12-10T20:01:07Z

Ok. Renamed, and added the suggestions. I also redid the commits to make it easier to merge, just in case.

jdroenner · 2024-12-11T23:00:33Z

Not really sure what i can add to the discussion. I guess it is a good thing to make it easier to use transactions.
I really like to know what is the use-case for this method?. Are there cases where you do not know if you work with Postgres or Json-files?

Atreyagaurav · 2024-12-11T23:09:36Z

Use case: For any user facing applications/API.

Think of a simple application that takes a vector file, does something and writes a vector file. Now we want to give user freedom of writing to any vector file format. While using transaction to speed up the writing process if it supports it. That way user can choose GeoJSON or such if required, but can also choose GPKG and get the benefit of faster write to file.

lnicola · 2024-12-12T07:03:07Z

Are there cases where you do not know if you work with Postgres or Json-files?

Imagine you're writing an ogr2ogr or gdal_polygonize clone. The user can pick any output format, but you still want to do some "optimistic" batching if possible.

jdroenner · 2024-12-12T09:11:12Z

func manipulates the &Dataset (which is self), right? Shouldn't the Fn take &mut then?

I'm not really a fan of the design with the closure. In my opinion we could add a type like DatasetTransaction that wraps a dataset. It could implement a trait that Dataset also implements. Then you could do whatever you want in a transaction on the DatasetTransaction object and return the dataset when you are done. If transactions are not supported you can still use the dataset. This way users still have to handle if transactions are not supported but that might be a good thing.

lnicola · 2024-12-12T09:27:03Z

Shouldn't the Fn take &mut then?

It should. And it probably should be FnMut since we're here.

I'm not really a fan of the design with the closure.

I'm not either, but it's not a bad design (certainly not as bad as quick-xml, ahem).

In my opinion we could add a type like DatasetTransaction that wraps a dataset. It could implement a trait that Dataset also implements.

I think we have that today, but you still need to put your code in a closure (untested):

let do = move |ds: &mut Dataset| -> Result<()> {
    let layer = ds.layer(0)?;
    let feature = ...;
    feature.create(&layer)
};

match dataset.start_transaction() {
    Ok(tx) => {
        do(&mut tx)?;
        tx.commit()?;
    }
    Err(GdalError::OgrNotSupported) => do(&mut dataset)?,
    Err(e) => return e,
}

which is what I proposed in #582 (comment). I don't remember if it actually works or not, but it's not very nice.

Atreyagaurav · 2024-12-12T10:09:32Z

I think we have that today, but you still need to put your code in a closure (untested):

That pattern doesn't work because dataset is borrowed as mut on start_transaction and cannot be borrowed again in the else block. Had to use a flag and check that flag to call dataset that doesn't support transaction. That was why #585 was added to make it easier to extract a flag without doing a mut borrow of dataset.

But yeah, we do have Transaction that implements everything Dataset does. This one is optionally a transaction (when supported), we tried the struct API earlier (#584) and decided on using a closure later for cleaner code, and less fight with borrow checker.

lnicola · 2024-12-12T10:13:00Z

Yeah, I suspected this was the case. It works with if let in the 2024 edition, but that's even more annoying to write.

Atreyagaurav · 2024-12-14T02:19:20Z

Made the closure FnMut.

lnicola · 2025-01-08T17:02:53Z

@Atreyagaurav given the lack of consensus here, do you think that #585 be enough for you in the short/medium-term? I'd very much prefer to have a helper like this one, but I'd also like to get a release out "soon".

(also, sorry for the delay)

Atreyagaurav · 2025-01-08T18:13:15Z

No problems. We can let it be unmerged for now. We can revisit if someone needs this feature, or when we have more interest on this PR later.

I already have the solution for my use case with similar function, and using #585 will make this easier to do with conditional for others as well.

Atreyagaurav mentioned this pull request Nov 27, 2024

Add MaybeTransaction for better writing speed when supported #584

Closed

2 tasks

weiznich reviewed Nov 28, 2024

View reviewed changes

Atreyagaurav force-pushed the transaction-closure branch 2 times, most recently from 1994f72 to b3e28f5 Compare November 28, 2024 14:51

Atreyagaurav changed the title ~~Add Dataset::maybe_transaction for faster database write~~ Add Dataset::maybe_batch for faster database write Nov 28, 2024

lnicola mentioned this pull request Nov 28, 2024

Add Dataset::has_capability #585

Merged

2 tasks

Atreyagaurav force-pushed the transaction-closure branch from 473a01d to b233c3f Compare November 28, 2024 19:40

lnicola reviewed Nov 28, 2024

View reviewed changes

Atreyagaurav force-pushed the transaction-closure branch from b233c3f to b27014d Compare November 28, 2024 20:04

weiznich approved these changes Nov 29, 2024

View reviewed changes

Atreyagaurav force-pushed the transaction-closure branch from 59d2a7a to 1b0f0b6 Compare November 29, 2024 16:08

lnicola approved these changes Dec 10, 2024

View reviewed changes

lnicola reviewed Dec 10, 2024

View reviewed changes

Atreyagaurav force-pushed the transaction-closure branch 2 times, most recently from e387814 to a77e308 Compare December 10, 2024 19:59

Atreyagaurav added 2 commits December 13, 2024 21:15

Add Dataset::maybe_run_in_batch for faster DS write when possible

5793fa5

Use std::fs::copy instead of file.write_all for temp test files

39d79bf

Atreyagaurav force-pushed the transaction-closure branch from a77e308 to 39d79bf Compare December 14, 2024 02:15

	- Add `Dataset::maybe_batch` which always succeeds to use transaction when possible for speed reasons -- rollbacks if operation fails under transaction ([#584](https://github.com/georust/gdal/pull/584))
	- Add `Dataset::maybe_batch` which runs a user function inside a transaction if the dataset supports it, as an optimization for I/O; if the function fails, the transaction may or may might be rolled back ([#584](https://github.com/georust/gdal/pull/584))

-    /// rolled back. If there is error in rollback it is ignored.
+    /// rolled back. If the rollback fails, it not reported.
+    ///
+    /// <div class="warning">This function offers no ACID guarantees and
+    /// no way to tell if the rollback succeeded or not. Do not use it if you
+    /// care about your destination dataset.</div>

	// ignoring rollback error
	// ignore rollback error because it's not guaranteed to be
	// supported, and the `func` failure is more important.

Add Dataset::maybe_batch for faster database write #591

Are you sure you want to change the base?

Add Dataset::maybe_batch for faster database write #591

Conversation

Atreyagaurav commented Nov 27, 2024

Atreyagaurav commented Nov 27, 2024

lnicola commented Nov 28, 2024 • edited Loading

weiznich commented Nov 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Atreyagaurav commented Nov 28, 2024

lnicola commented Nov 28, 2024

Atreyagaurav commented Nov 28, 2024

Atreyagaurav commented Nov 28, 2024

lnicola commented Nov 28, 2024

Atreyagaurav commented Nov 28, 2024 • edited Loading

Atreyagaurav commented Nov 28, 2024 • edited Loading

lnicola commented Nov 28, 2024

Atreyagaurav commented Nov 28, 2024

Atreyagaurav commented Nov 28, 2024

lnicola commented Nov 28, 2024

lnicola commented Nov 28, 2024

Atreyagaurav commented Nov 28, 2024

lnicola commented Nov 28, 2024

Atreyagaurav commented Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lnicola Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lnicola commented Nov 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lnicola commented Nov 29, 2024

weiznich commented Nov 29, 2024

Atreyagaurav commented Nov 29, 2024 • edited Loading

Atreyagaurav commented Nov 29, 2024

lnicola commented Nov 29, 2024

Atreyagaurav commented Nov 29, 2024

lnicola left a comment

Choose a reason for hiding this comment

Atreyagaurav commented Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lnicola commented Dec 10, 2024

Atreyagaurav commented Dec 10, 2024

jdroenner commented Dec 11, 2024

Atreyagaurav commented Dec 11, 2024

lnicola commented Dec 12, 2024

jdroenner commented Dec 12, 2024

lnicola commented Dec 12, 2024 • edited Loading

Atreyagaurav commented Dec 12, 2024

lnicola commented Dec 12, 2024

Atreyagaurav commented Dec 14, 2024

lnicola commented Jan 8, 2025

Atreyagaurav commented Jan 8, 2025

Add `Dataset::maybe_batch` for faster database write #591

Add `Dataset::maybe_batch` for faster database write #591

lnicola commented Nov 28, 2024 •

edited

Loading

Atreyagaurav commented Nov 28, 2024 •

edited

Loading

Atreyagaurav commented Nov 28, 2024 •

edited

Loading

Atreyagaurav commented Nov 28, 2024 •

edited

Loading

lnicola Nov 28, 2024 •

edited

Loading

Atreyagaurav commented Nov 29, 2024 •

edited

Loading

Atreyagaurav commented Dec 10, 2024 •

edited

Loading

lnicola commented Dec 12, 2024 •

edited

Loading