VFS how to read gzipped content from tar file

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

VFS how to read gzipped content from tar file

Ken Tanaka
I would like to create an uncompressed file from a compressed file
inside of a tar archive.

Can VFS allow me to do this in one step? I can get the compressed.gz
file from archive.tar as a file on disk, then I can decompress the gzip
file and then delete the .gz version. If there is an example, tutorial
or book online or in print that would be great, I haven't found anything
like this yet.

Conceptually there is a tar file:

archive.tar
 +- tardir/
     +- content.txt.gz

I'd like to end up with an uncompressed file "content.txt".

I tried something like:

    FileObject gzTarFile =
fsManager.resolveFile("tar:gz:/archive.tar!/tardir/content.txt.gz");

    LocalFile newFile = (LocalFile)
fsManager.resolveFile("file:///destination/content.txt");
    newFile.copyFrom(gzTarFile, new AllFileSelector());

Thanks in advance for any advice
-Ken

The test program I'm working with follows:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - -
package gov.noaa.eds.tryVfs;

import org.apache.commons.vfs.FileName;
import org.apache.commons.vfs.FileObject;
import org.apache.commons.vfs.FileSystemException;
import org.apache.commons.vfs.FileSystemManager;
import org.apache.commons.vfs.VFS;

/**
 * Try using VFS to read the content of a compressed (gz) file inside of
 * a tar file.
 */
public class App
{
    static FileSystemManager fsManager = null;
   
    public static void main( String[] args )
    {
        try {
            fsManager = VFS.getManager();
        } catch (FileSystemException ex) {
            ex.printStackTrace();
        }
       
        try {
            /* resolveFile OK */
            System.out.println("Resolve tar file:");
            FileObject tarFile = fsManager.resolveFile(
                    "tar:/extra/data/tryVfs/archive.tar");
           
            FileName tarFileName = tarFile.getName();
            System.out.println("  Path     : " + tarFileName.getPath());
            System.out.println("  URI      : " + tarFileName.getURI());
           
           
            /* resolveFile OK */
            System.out.println("Resolve gzip file inside tar file:");
            FileObject gzTarFile = fsManager.resolveFile(
                   
"tar:file:///extra/data/tryVfs/archive.tar!/tarDir/content.txt.gz");
           
            FileName gzTarFileName = gzTarFile.getName();
            System.out.println("  Path     : " + gzTarFileName.getPath());
            System.out.println("  URI      : " + gzTarFileName.getURI());
           
           
            /* resolveFile has an error
             * uncomment one of the // "file string" arguments for
resolveFile below
             * each of the strings I've tried has an /* error message * /
             */
            System.out.println("Resolve content of gzip file inside tar
file:");
            FileObject contentFile = fsManager.resolveFile(
//                
"tar:gz:/extra/data/tryVfs/archive.tar!/tarDir/content.txt.gz"
                /* Unknown message with code "Unknown message with code
"vfs.provider.tar/open-tar-file.error".". */
                   
//                
"tar:gz:///extra/data/tryVfs/archive.tar!/tarDir/content.txt.gz"
                /* Unknown message with code "Unknown message with code
"vfs.provider.tar/open-tar-file.error".". */
                   
//                
"tar:file:gz:/extra/data/tryVfs/archive.tar!/tarDir/content.txt.gz"
                /* URI
"file:gz:///extra/data/tryVfs/archive.tar!/tarDir/content.txt.gz" is not
an absolute file name. */
                   
//                
"tar:file:gz:/extra/data/tryVfs/archive.tar!///tarDir/content.txt.gz!/content.txt"
                /* URI
"file:gz:///extra/data/tryVfs/archive.tar!/tarDir/content.txt.gz" is not
an absolute file name. */
                   
               
"tar:gz:///extra/data/tryVfs/archive.tar!/tarDir/content.txt"
                /* Unknown message with code "Unknown message with code
"vfs.provider.tar/open-tar-file.error".". */
                );
           
            FileName contentFileName = contentFile.getName();
            System.out.println("  Path     : " + contentFileName.getPath());
            System.out.println("  URI      : " + contentFileName.getURI());
           
            /* copy uncompressed content to a new file */
//            LocalFile newFile = (LocalFile) fsManager.resolveFile(
//                    "file:///extra/data/tryVfs/content.txt");          
//            newFile.copyFrom(contentFile, new AllFileSelector());
        } catch (FileSystemException ex) {
            ex.printStackTrace();
        }
    } // main( String[] args )
}

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: VFS how to read gzipped content from tar file

Philippe Poulard
Hi Ken,

Ken Tanaka a écrit :
>
>    FileObject gzTarFile =
> fsManager.resolveFile("tar:gz:/archive.tar!/tardir/content.txt.gz");

try this :

fsManager.resolveFile("gz:tar:/archive.tar!/tardir/content.txt.gz");

--
Cordialement,

               ///
              (. .)
  --------ooO--(_)--Ooo--------
|      Philippe Poulard       |
  -----------------------------
  http://reflex.gforge.inria.fr/
        Have the RefleX !

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: VFS how to read gzipped content from tar file

Ken Tanaka
Thanks for the suggestion, but I'm getting a different error when I try
that:
org.apache.commons.vfs.FileSystemException: Could not resolve file
"gz:tar:file:///extra/data/tryVfs/archive.tar!/!/".      
        at
org.apache.commons.vfs.provider.AbstractFileSystem.resolveFile(AbstractFileSystem.java:301)
        at
org.apache.commons.vfs.provider.AbstractFileSystem.resolveFile(AbstractFileSystem.java:267)
        at
org.apache.commons.vfs.provider.AbstractFileSystem.getRoot(AbstractFileSystem.java:242)
        at
org.apache.commons.vfs.provider.AbstractLayeredFileProvider.createFileSystem(AbstractLayeredFileProvider.java:82)
        at
org.apache.commons.vfs.provider.AbstractLayeredFileProvider.findFile(AbstractLayeredFileProvider.java:59)
        at
org.apache.commons.vfs.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:641)
        at
org.apache.commons.vfs.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:602)
        at
org.apache.commons.vfs.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:570)
        at gov.noaa.eds.tryVfs.App.main(App.java:51)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out
of range: -1
        at java.lang.String.substring(String.java:1768)
        at
org.apache.commons.vfs.provider.compressed.CompressedFileFileObject.<init>(CompressedFileFileObject.java:48)
        at
org.apache.commons.vfs.provider.gzip.GzipFileObject.<init>(GzipFileObject.java:39)
        at
org.apache.commons.vfs.provider.gzip.GzipFileSystem.createFile(GzipFileSystem.java:42)
        at
org.apache.commons.vfs.provider.AbstractFileSystem.resolveFile(AbstractFileSystem.java:296)
        ... 8 more


Here is the exact code corresponding to the above error:
            FileObject contentFile = fsManager.resolveFile(
                   
"gz:tar:///extra/data/tryVfs/archive.tar!/tardir/content.txt.gz"
                );

Philippe Poulard wrote:

> Hi Ken,
>
> Ken Tanaka a écrit :
>>
>>    FileObject gzTarFile =
>> fsManager.resolveFile("tar:gz:/archive.tar!/tardir/content.txt.gz");
>
> try this :
>
> fsManager.resolveFile("gz:tar:/archive.tar!/tardir/content.txt.gz");
>

--
= Enterprise Data Services Division ===============
| CIRES, National Geophysical Data Center / NOAA  |
| 303-497-6221                                    |
= [hidden email] =============================


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: VFS how to read gzipped content from tar file

Ken Tanaka
To follow up: I never did get a direct extract of the gzipped content
from inside of a tar file, but took a multistep approach to get the
files I want.

I've documented what I've come up with so far:

http://wiki.apache.org/jakarta-commons/ExtractAndDecompressGzipFiles

I started a VfsCookbook page in the wiki for people to contribute
examples to (hint, hint). I think that working examples of VFS are lacking.

-Ken

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: VFS how to read gzipped content from tar file

Mark Fortner-3
In reply to this post by Ken Tanaka
You mentioned that you wanted to look into a tarball (gzipped tar file), but
the URL you gave was only for a tar file.  Something like this should work:

gz:tar:file:///extra/data/tryVfs/archive.tar.gz!/myfile.txt

Hope this helps,

Mark

On 10/31/07, Ken Tanaka < [hidden email]> wrote:

>
> Thanks for the suggestion, but I'm getting a different error when I try
> that:
> org.apache.commons.vfs.FileSystemException: Could not resolve file
> "gz:tar:file:///extra/data/tryVfs/archive.tar!/!/".
>         at
> org.apache.commons.vfs.provider.AbstractFileSystem.resolveFile (
> AbstractFileSystem.java:301)
>         at
> org.apache.commons.vfs.provider.AbstractFileSystem.resolveFile(
> AbstractFileSystem.java:267)
>         at
> org.apache.commons.vfs.provider.AbstractFileSystem.getRoot(
> AbstractFileSystem.java :242)
>         at
>
> org.apache.commons.vfs.provider.AbstractLayeredFileProvider.createFileSystem
> (AbstractLayeredFileProvider.java:82)
>         at
> org.apache.commons.vfs.provider.AbstractLayeredFileProvider.findFile (
> AbstractLayeredFileProvider.java:59)
>         at
> org.apache.commons.vfs.impl.DefaultFileSystemManager.resolveFile(
> DefaultFileSystemManager.java:641)
>         at
> org.apache.commons.vfs.impl.DefaultFileSystemManager.resolveFile (
> DefaultFileSystemManager.java:602)
>         at
> org.apache.commons.vfs.impl.DefaultFileSystemManager.resolveFile(
> DefaultFileSystemManager.java:570)
>         at gov.noaa.eds.tryVfs.App.main(App.java:51)
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out
> of range: -1
>         at java.lang.String.substring(String.java:1768)
>         at
> org.apache.commons.vfs.provider.compressed.CompressedFileFileObject.<init>(
> CompressedFileFileObject.java:48)
>         at
> org.apache.commons.vfs.provider.gzip.GzipFileObject.<init>(
> GzipFileObject.java:39)
>         at
> org.apache.commons.vfs.provider.gzip.GzipFileSystem.createFile(
> GzipFileSystem.java :42)
>         at
> org.apache.commons.vfs.provider.AbstractFileSystem.resolveFile(
> AbstractFileSystem.java:296)
>         ... 8 more
>
>
> Here is the exact code corresponding to the above error:
>             FileObject contentFile = fsManager.resolveFile(
>
> "gz:tar:///extra/data/tryVfs/archive.tar!/tardir/content.txt.gz"
>                 );
>
> Philippe Poulard wrote:
> > Hi Ken,
> >
> > Ken Tanaka a écrit :
> >>
> >>    FileObject gzTarFile =
> >> fsManager.resolveFile("tar:gz:/archive.tar!/tardir/content.txt.gz");
> >
> > try this :
> >
> > fsManager.resolveFile("gz:tar:/archive.tar!/tardir/content.txt.gz");
> >
>
> --
> = Enterprise Data Services Division ===============
> | CIRES, National Geophysical Data Center / NOAA  |
> | 303-497-6221                                    |
> = [hidden email] =============================
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: VFS how to read gzipped content from tar file

Ken Tanaka
Actually the tarfile is not compressed, files inside the tar file are
gzipped files, for example

tar tvf archive.tar
drwxrwsr-x ktanaka/ktanaka   0 2007-10-30 12:45:26 tardir/
-rw-rw-r-- ktanaka/ktanaka  56 2007-10-30 12:44:37 tardir/content.txt.gz

I'd like to directly create a content.txt file from the above archive.tar

Thanks for the posting though, you gave me an idea to try that led to a
solution:

   
gz:tar:file:///extra/data/tryVfs/archive.tar!/tardir/content.txt.gz!content.txt

It was unclear to me from the Javadoc how to build up this name
parameter for
FileSystemManager.resolveFile(name). Although I see after the fact that
if I had studied
http://commons.apache.org/vfs/filesystems.html
the "Zip, Jar and Tar" section has a 5th example
"|tar:gz:http://anyhost/dir/mytar.tar.gz!/mytar.tar!/path/in/tar/README.txt|"
 From this maybe I could have deduced that multiple paths can be chained
together
with a "!" as a separator, while file system designators ("file:", "tar:"
and "gz:") should be prepended onto the front in reverse order.

I'll update the example I started in the VFS wiki to reflect the much
simpler name

http://wiki.apache.org/jakarta-commons/ExtractAndDecompressGzipFiles

-Ken


Mark Fortner wrote:

> You mentioned that you wanted to look into a tarball (gzipped tar
> file), but
> the URL you gave was only for a tar file. Something like this should work:
>
> gz:tar:file:///extra/data/tryVfs/archive.tar.gz!/myfile.txt
>
> Hope this helps,
>
> Mark
>
> On 10/31/07, Ken Tanaka < [hidden email]> wrote:
>> Thanks for the suggestion, but I'm getting a different error when I try
>> that:
>> org.apache.commons.vfs.FileSystemException: Could not resolve file
>> "gz:tar:file:///extra/data/tryVfs/archive.tar!/!/".
...

>>
>>
>>
>> Here is the exact code corresponding to the above error:
>> FileObject contentFile = fsManager.resolveFile(
>>
>> "gz:tar:///extra/data/tryVfs/archive.tar!/tardir/content.txt.gz"
>> );
>>
>> Philippe Poulard wrote:
>>> Hi Ken,
>>>
>>> Ken Tanaka a écrit :
>>>> FileObject gzTarFile =
>>>> fsManager.resolveFile("tar:gz:/archive.tar!/tardir/content.txt.gz");
>>> try this :
>>>
>>> fsManager.resolveFile("gz:tar:/archive.tar!/tardir/content.txt.gz");
>>>
>> --
>> = Enterprise Data Services Division ===============
>> | CIRES, National Geophysical Data Center / NOAA |
>> | 303-497-6221 |
>> = [hidden email] =============================
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]