Page MenuHomeNicheWork Phabricator

MABS: Don't always fetch all revisions
Closed, ResolvedPublic

Description

In import_ref_by_revs() there is this bit that, on the initial clone, will end up fetching way too many pages:

my $revision_ids = [ $fetch_from .. $last_remote ];
return $self->import_revids( $fetch_from, $revision_ids, $pages );

$fetch_from is 0 on the first time and $last_remote is last revid for the most recent page. We do not want all pages, we only want the pages that apply to our subset.

Event Timeline

hexmode created this task.Sep 14 2019, 6:42 PM
hexmode created this object in space S3 Public NicheWork.
hexmode triaged this task as Normal priority.
hexmode updated the task description. (Show Details)Sep 15 2019, 3:39 PM

This is demonstrated by the following:

git clone -c remote.origin.pages='Atlanta Toledo ' mediawiki::http://asyncwiki-moon.wmflabs.org/demo/ async

which should only fetch the histories for two pages.

We need a way to populated the revs to fetch when we are not cloning the whole repository.

hexmode changed the visibility from "All Users" to "Public (No Login Required)".Sep 15 2019, 3:43 PM
hexmode closed this task as Resolved.Sep 15 2019, 11:00 PM

with the work done today, this is done. At least when only a few pages are specified, only those are fetched.

Note, though, that now we face T301: MABS: Fix double-fetch of pages