SimpleFileExtractor

This program will extract files out of a set of FileCollections. The result is a new FileCollection which will be a flat set of all the extracted files.

Files are specified using standard UNIX file 'patterns' (or globs). These are not regular expressions.

File patterns may contain:

*
An asterix will match zero, one, or many characters in a filename or in a path component. It will not match the directory separator "/".
?
A question mark will match a single character in a filename or in a path component.

File patterns will be matched with all the files in all the selected FileCollections in input, and files that are matched will be extracted and copied into the output FileCollection, as a flat list (no subdirectories will be created in it).

All file patterns must include a first component that will be matched with the base name of the FileCollection itself. That means a pattern will always include at least on "/" character in it. This can be used to write patterns that will match only a subset of the input FileCollections.

This program will not extract:

For performance reasons, the program will ensure that at execution time, the set of files given in input are located on a DataProvider that is configured as local storage, to avoid having to copy the entire file collection contents before extraction.

Examples

Assume we have selected three FileCollections named "Dataset1", "CivetOut2" and "BIDS3", and that they contain these files:

Content of all three collections:

Dataset1/README.txt
Dataset1/sales/sales.csv
Dataset1/sales/sales.txt
Dataset1/reports/errors.txt
Dataset1/reports/errors.pdf
CivetOut2/native/subject.mnc.gz
CivetOut2/thickness/thick_subject_30mm.txt
CivetOut2/thickness/thick_subject_40mm.txt
CivetOut2/surfaces/surf_subject_30mm.txt
CivetOut2/surfaces/surf_subject_40mm.txt
BIDS3/Report.txt
BIDS3/sub-1234/README.txt
BIDS3/sub-1234/anat/sub-1234.nii.gz

The following patterns will each extract these files:

Pattern: *D*/R*.txt
Files matched: Dataset1/README.txt, BIDS3/Report.txt
Resulting files in output: README.txt, report.txt

Pattern: */s*/*.txt
Files matched: Dataset1/sales/sales.txt, CivetOut2/surfaces/surf_subject_30mm.txt, CivetOut2/surfaces/surf_subject_40mm.txt, BIDS3/sub-1234/README.txt
Resulting files in output: sales.txt, surf_subject_30mm.txt, surf_subject_40mm.txt, README.txt

Pattern: */*/*30mm*
Files matched: CivetOut2/thickness/thick_subject_30mm.txt, CivetOut2/surfaces/surf_subject_30mm.txt
Extracted files: thick_subject_30mm.txt, surf_subject_30mm.txt