What I am currently doing is using the getObject method from S3 to obtain a ResponseInputStream of a compressed file, and then processing these compressed files through some stream methods. Similar to the following code:
ZipArchiveInputStream zipIn = s3.getZipIn();
while ((entry = zipIn.getNextZipEntry()) != null) {
if (entry.isDirectory()) {
continue;
}
long curFileSize = entry.getSize();
ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
zipIn.transferTo(byteOut);
//do something
String fileName = entry.getName();
}
I think my current approach will download the entire compressed file before executing my logic.
But now I have a special requirement, which is to only obtain the relative path file name of each file, without the actual content of each file. I know that some meta information of this compressed format, such as zip, will exist in certain header or tail partitions. I have seen many simple ways to read file names from local files, but I am not sure if there is a similar function for skipping downloads in my way of obtaining streams from the network, so that I can complete this task without consuming a large amount of network bandwidth.
I know S3 supports partial downloads. Is there any reasonable solution or library to do this?
No, you have to download file from S3 before you can read it.
There is no simpler way to get the entry names from a zip file than scanning the file. And tar.gz is an entirely different file format.