How to list all file names from a Zip/tar.gz file inputstream?

What I am currently doing is using the getObject method from S3 to obtain a ResponseInputStream of a compressed file, and then processing these compressed files through some stream methods. Similar to the following code:

ZipArchiveInputStream zipIn = s3.getZipIn();
while ((entry = zipIn.getNextZipEntry()) != null) {
                if (entry.isDirectory()) {
                    continue;
                }
                long curFileSize = entry.getSize();
                ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
                zipIn.transferTo(byteOut);
                //do something
                String fileName = entry.getName();
}

I think my current approach will download the entire compressed file before executing my logic.

But now I have a special requirement, which is to only obtain the relative path file name of each file, without the actual content of each file. I know that some meta information of this compressed format, such as zip, will exist in certain header or tail partitions. I have seen many simple ways to read file names from local files, but I am not sure if there is a similar function for skipping downloads in my way of obtaining streams from the network, so that I can complete this task without consuming a large amount of network bandwidth.

I know S3 supports partial downloads. Is there any reasonable solution or library to do this?

  • 2

    No, you have to download file from S3 before you can read it.

    – 

  • 2

    There is no simpler way to get the entry names from a zip file than scanning the file. And tar.gz is an entirely different file format.

    – 

Leave a Comment