[lfs-dev] Latest packages
dueffert at uwe-dueffert.de
dueffert at uwe-dueffert.de
Sat May 4 02:20:31 PDT 2013
On Fri, 3 May 2013, Bruce Dubbs wrote:
> I'm going to write a program to automatically identify out of date
> packages for LFS. Has anyone already done such a beast?
I'm kind of doing that for a couple of years now (including some BLFS and
even Windows stuff as well ;-]). I started with a bunch of bash scripts
that basically parsed certain maintainer websites with certain regexps.
This was quite hard to read, neither fast nor flexible and always out of
Current solution (that I'm happy with for quite some years):
All parsing stuff is done by a simple single C(++?) program now.
It basically follows _all_ links and handles general stuff like stripping
common extensions (*.tgz etc) or an appended "/download" and replacing
"/from/a/mirror" by "/from/this/mirror".
As basic input it gets a list of simple rules to look for:
$packagename $starturl $pattern, e.g.:
mpc http://www.multiprecision.org/?prog=mpc&page=download tar.gz
check http://sourceforge.net/projects/check/files/check/ /tar.gz/download
$pattern in most cases only specifies the (sub/parent)directory depth to
search in (number of leading slashes) and the extension (or better: end)
of the links to look for there. It usually does not filter for any kind of
naming or versioning scheme. As a result I get a list of
directories/websites searched in and a list of URLs to potentially
This would include following uninteresting links (such as parent dirs or
adverts or subdirs of outdated versions or subdirs of packages I'm not
interested in). Therefore I keep a list of fully qualified
directories/websites not to be searched by above C program again, e.g:
This would give me a list of package URLs, but include stuff that I'm not
intersted in (which just happens to come from the same directory/site) or
stuff that I already have. Therefore I keep a list of such done packages
with certain extensions stripped (to avoid getting an tar.gz as tar.xz
The C program has those 3 lists (currently 24KB commented rules, 120KB
dirs done, 230KB packages done) in memory and can therefore filter results
[You can add further sanity checks like remembering when a certain rule
resulted in package URLs at all or in new package URLs for the last time
to hint at taking a look whether the maintainer changed website, extension
or subdir structure.]
So I automatically get a list of subdirs currently searched (and may
exclude older versions or new unintersting packages or new advert from
further search) and I automatically get a list of new package URLs that I
may either want to download or just mark as done (for skipping missed
intermediate versions or by-catch of packages I'm not interested in).
Example: current list of new package URLs that I might potentially be
interested in downloading:
Surely not perfect, but easy to maintain and does the job for me...
More information about the lfs-dev