Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Liberation] Topological sorter, entities remapping and add missing imports #2030

Draft
wants to merge 70 commits into
base: trunk
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
3f5d25e
First topological sorter draft
zaerl Nov 26, 2024
49a4486
Move topological sort to separate function
zaerl Nov 26, 2024
81d0d23
Fix: missing importer initialization
zaerl Nov 26, 2024
968777d
Add categories to the sorter
zaerl Nov 26, 2024
1c6b42f
Add new in-place sort
zaerl Nov 27, 2024
7f89e65
Add memory-free functions
zaerl Nov 27, 2024
8bc81d7
Replace bin script with wp-cli command
zaerl Nov 27, 2024
24d9e4a
Add special cases
zaerl Nov 27, 2024
331d322
Change the sorting algorithm to qsort
zaerl Nov 28, 2024
ec07803
Add a TODO
zaerl Nov 28, 2024
8fe8ec6
Update names
zaerl Nov 29, 2024
7b2a1bb
Fix: change variable name
zaerl Nov 29, 2024
3a436b8
Add support for categories
zaerl Nov 29, 2024
8e0c71a
Fix: remove double slashes
zaerl Dec 4, 2024
3a8ab54
Add test check
zaerl Dec 4, 2024
1c102a7
Add new hooks
zaerl Dec 4, 2024
c99aa44
Add new topo sorting query
zaerl Dec 4, 2024
4e16d38
Remove unused check
zaerl Dec 4, 2024
c5bcfe8
Temporary disable test
zaerl Dec 4, 2024
ad63f50
Remove debug code
zaerl Dec 4, 2024
8587272
Remove rebase artifacts
zaerl Dec 4, 2024
7294ef5
Change to new function signature
zaerl Dec 6, 2024
216393e
Add support for count
zaerl Dec 6, 2024
8484509
Add session to CLI
zaerl Dec 6, 2024
fe21588
Add start session
zaerl Dec 6, 2024
23d78f7
Add support for sessions
zaerl Dec 9, 2024
f2886b6
Add categories check
zaerl Dec 9, 2024
756b0ad
Fix: wrong name
zaerl Dec 9, 2024
544c788
Partial tests rework
zaerl Dec 9, 2024
89b1fd3
Add comments test
zaerl Dec 10, 2024
2c85c20
New sorter indexing
zaerl Dec 11, 2024
691ddaa
Fix: missing key
zaerl Dec 11, 2024
fbc1542
Remove useless code
zaerl Dec 11, 2024
66219ba
Remove SQLite case
zaerl Dec 11, 2024
7d80838
Move plugin methods outside class
zaerl Dec 11, 2024
e79ab84
Create Playground base test class
zaerl Dec 11, 2024
00d8c0a
Fix: wrong keys
zaerl Dec 11, 2024
a73a03e
Add core postmeta_no_cdata test
zaerl Dec 11, 2024
35a8c52
Add core importer tests
zaerl Dec 11, 2024
5f8c905
Add new core importer tests
zaerl Dec 11, 2024
6a2d2f0
Update WXR to last core importer
zaerl Dec 11, 2024
1ed598f
Add support for PHPUnit filters
zaerl Dec 11, 2024
6da413a
Remove old test
zaerl Dec 11, 2024
173c716
Fix: remove debug code
zaerl Dec 11, 2024
08838aa
Fix: wrong check
zaerl Dec 11, 2024
606859a
Add new unit tests and remove old one
zaerl Dec 11, 2024
4c472fc
Add support for term meta
zaerl Dec 12, 2024
b3d70a8
Add comment
zaerl Dec 12, 2024
c9a9170
Rename "elements" to "entities" to match name convention
zaerl Dec 12, 2024
8dea6fc
Remove filters and actions and move mapping to WP_Entity_Importer
zaerl Dec 12, 2024
34a17ca
Fix: remove NOT NULL
zaerl Dec 13, 2024
6cde89f
Add post terms import
zaerl Dec 17, 2024
0b759e8
Fix: use slug instead of the description for categories
zaerl Dec 17, 2024
34e2752
Add new unit tests
zaerl Dec 17, 2024
f6601eb
Fix: remove debug code
zaerl Dec 17, 2024
f58bb44
Add a set_session method
zaerl Dec 18, 2024
7615432
Add support for sessions
zaerl Dec 18, 2024
1aba667
Fix: serialized term meta
zaerl Dec 18, 2024
98565ec
Fix: missing brace
zaerl Dec 18, 2024
787c224
Remove "count" parameter
zaerl Dec 18, 2024
b11fe9b
Add new sorter
zaerl Jan 3, 2025
9d19eb9
Add unit test
zaerl Jan 4, 2025
0b68a60
Removed all changes of #2105 and #2104
zaerl Jan 4, 2025
19db782
Removed import scrit
zaerl Jan 4, 2025
28fe35d
Fix: remove terms meta from import session
zaerl Jan 4, 2025
7e2c1cf
Fix: restore functions.php file
zaerl Jan 4, 2025
8ed77ed
Add fseek() support
zaerl Jan 7, 2025
2bf73dc
Fix: typo
zaerl Jan 7, 2025
5ae2e14
Fix: set cursor_id to null
zaerl Jan 8, 2025
e3ba973
Fix: rename class to follow new standard
zaerl Jan 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions packages/playground/data-liberation/bootstrap.php
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@
require_once __DIR__ . '/src/entity-readers/WP_Entity_Reader.php';
require_once __DIR__ . '/src/entity-readers/WP_HTML_Entity_Reader.php';
require_once __DIR__ . '/src/entity-readers/WP_WXR_Entity_Reader.php';
require_once __DIR__ . '/src/entity-readers/WP_WXR_Sorted_Entity_Reader.php';
require_once __DIR__ . '/src/entity-readers/WP_Directory_Tree_Entity_Reader.php';

require_once __DIR__ . '/src/xml-api/WP_XML_Decoder.php';
Expand Down
1 change: 1 addition & 0 deletions packages/playground/data-liberation/phpunit.xml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
<file>tests/WPXMLProcessorTests.php</file>
<file>tests/UrldecodeNTests.php</file>
<file>tests/WPStreamImporterTests.php</file>
<file>tests/WPWXRSortedReaderTests.php</file>
</testsuite>
</testsuites>
</phpunit>
37 changes: 35 additions & 2 deletions packages/playground/data-liberation/plugin.php
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,39 @@ function () {
}
);

function data_liberation_activate() {
// Create tables and option.
WP_WXR_Sorted_Entity_Reader::create_or_update_db();
update_option( 'data_liberation_db_version', WP_WXR_Sorted_Entity_Reader::DB_VERSION );
}

// Run when the plugin is activated.
register_activation_hook( __FILE__, 'data_liberation_activate' );

function data_liberation_deactivate() {
// Flush away all data.
WP_WXR_Sorted_Entity_Reader::delete_db();

// Delete the option.
delete_option( 'data_liberation_db_version' );

// @TODO: Cancel any active import sessions and cleanup other data.
}

// Run when the plugin is deactivated.
register_deactivation_hook( __FILE__, 'data_liberation_deactivate' );

function data_liberation_load() {
if ( WP_WXR_Sorted_Entity_Reader::DB_VERSION !== (int) get_site_option( 'data_liberation_db_version' ) ) {
// Update the database with dbDelta, if needed in the future.
WP_WXR_Sorted_Entity_Reader::create_or_update_db();
update_option( 'data_liberation_db_version', WP_WXR_Sorted_Entity_Reader::DB_VERSION );
}
}

// Run when the plugin is loaded.
add_action( 'plugins_loaded', 'data_liberation_load' );

// Register admin menu
add_action(
'admin_menu',
Expand Down Expand Up @@ -439,15 +472,15 @@ function data_liberation_create_importer( $import ) {
}
$importer = WP_Stream_Importer::create_for_wxr_file(
$wxr_path,
array(),
$import,
$import['cursor'] ?? null
);
break;

case 'wxr_url':
$importer = WP_Stream_Importer::create_for_wxr_url(
$import['wxr_url'],
array(),
$import,
$import['cursor'] ?? null
);
break;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ class WP_WXR_Entity_Reader extends WP_Entity_Reader {
* @since WP_VERSION
* @var WP_XML_Processor
*/
private $xml;
protected $xml;

/**
* The name of the XML tag containing information about the WordPress entity
Expand Down Expand Up @@ -206,15 +206,23 @@ class WP_WXR_Entity_Reader extends WP_Entity_Reader {
* @since WP_VERSION
* @var int|null
*/
private $last_post_id = null;
protected $last_post_id = null;

/**
* The ID of the last processed comment.
*
* @since WP_VERSION
* @var int|null
*/
private $last_comment_id = null;
protected $last_comment_id = null;

/**
* The ID of the last processed term.
*
* @since WP_VERSION
* @var int|null
*/
protected $last_term_id = null;

/**
* Buffer for accumulating text content between tags.
Expand All @@ -229,7 +237,7 @@ class WP_WXR_Entity_Reader extends WP_Entity_Reader {
*
* @var WP_Byte_Reader
*/
private $upstream;
protected $upstream;

/**
* Mapping of WXR tags representing site options to their WordPress options names.
Expand Down Expand Up @@ -331,6 +339,13 @@ class WP_WXR_Entity_Reader extends WP_Entity_Reader {
'wp:term_name' => 'name',
),
),
'wp:termmeta' => array(
'type' => 'term_meta',
'fields' => array(
'wp:meta_key' => 'meta_key',
'wp:meta_value' => 'meta_value',
),
),
'wp:tag' => array(
'type' => 'tag',
'fields' => array(
Expand All @@ -343,6 +358,7 @@ class WP_WXR_Entity_Reader extends WP_Entity_Reader {
'wp:category' => array(
'type' => 'category',
'fields' => array(
'wp:term_id' => 'term_id',
'wp:category_nicename' => 'slug',
'wp:category_parent' => 'parent',
'wp:cat_name' => 'name',
Expand All @@ -351,7 +367,7 @@ class WP_WXR_Entity_Reader extends WP_Entity_Reader {
),
);

public static function create( WP_Byte_Reader $upstream = null, $cursor = null ) {
public static function create( WP_Byte_Reader $upstream = null, $cursor = null, $options = array() ) {
$xml_cursor = null;
if ( null !== $cursor ) {
$cursor = json_decode( $cursor, true );
Expand All @@ -367,10 +383,11 @@ public static function create( WP_Byte_Reader $upstream = null, $cursor = null )
}

$xml = WP_XML_Processor::create_for_streaming( '', $xml_cursor );
$reader = new WP_WXR_Entity_Reader( $xml );
$reader = new static( $xml );
if ( null !== $cursor ) {
$reader->last_post_id = $cursor['last_post_id'];
$reader->last_comment_id = $cursor['last_comment_id'];
$reader->last_term_id = $cursor['last_term_id'];
}
if ( null !== $upstream ) {
$reader->connect_upstream( $upstream );
Expand Down Expand Up @@ -416,6 +433,7 @@ public function get_reentrancy_cursor() {
'upstream' => $this->last_xml_byte_offset_outside_of_entity,
'last_post_id' => $this->last_post_id,
'last_comment_id' => $this->last_comment_id,
'last_term_id' => $this->last_term_id,
)
);
}
Expand Down Expand Up @@ -476,6 +494,17 @@ public function get_last_comment_id() {
return $this->last_comment_id;
}

/**
* Gets the ID of the last processed term.
*
* @since WP_VERSION
*
* @return int|null The term ID, or null if no terms have been processed.
*/
public function get_last_term_id() {
return $this->last_term_id;
}

/**
* Appends bytes to the input stream.
*
Expand Down Expand Up @@ -560,7 +589,7 @@ public function next_entity() {
*
* @return bool Whether another entity was found.
*/
private function read_next_entity() {
protected function read_next_entity() {
if ( $this->xml->is_finished() ) {
$this->after_entity();
return false;
Expand Down Expand Up @@ -870,8 +899,12 @@ private function emit_entity() {
$this->entity_data['comment_id'] = $this->last_comment_id;
} elseif ( $this->entity_type === 'tag' ) {
$this->entity_data['taxonomy'] = 'post_tag';
$this->last_term_id = $this->entity_data['term_id'];
} elseif ( $this->entity_type === 'category' ) {
$this->entity_data['taxonomy'] = 'category';
$this->last_term_id = $this->entity_data['term_id'];
} elseif ( $this->entity_type === 'term_meta' ) {
$this->entity_data['term_id'] = $this->last_term_id;
}
$this->entity_finished = true;
++$this->entities_read_so_far;
Expand Down
Loading
Loading