Interesting read. Thanks for sharing.
Maybe it is by lack of experience, but I always treat tarballs like a loaded gun. I extract then inside an empty subdirectory in my home first just to be sure and then move the data as required.
It is no fun having to cleanup the mess left by an incorrect extraction.
I'd argue this is a design bug. Extracting into the current directory should either not be possible or be the exception (tar xvfz --current-directory) and not singling out tar here, unzip, pkunzip, etc all have this issue and have all cause people data loss and worse because if this default unsafe behavior
Unless there is only one folder in the archive, and it's not overwriting anything, then it should be extracted into the current directory so you don't get nested dupes
That might be true but I've using arc, tar, pkzip since the 80s and even then I lost work and had to clean up on floppy disks because of this issue. I suppose the prudent thing is to list the files before decompressing
I have 30 years of professional experience and with one off tarballs that I'm not deeply familiar with, this is usually what I do as well (certainly with a tarball that has a /usr like structure inside of it). You're good
Most distros should have a useful Linux package "atool", containing the command "aunpack" which does the least surprising thing for all archive types (without creating duplicate root when it already exists).
Extracting archives directly into your system root as a superuser is in the same class of activity as piping curl output into your shell interpreter as a superuser: things that no one should ever do.
It's nice to check if you're about to extract a 1MB tarball into 2TBs of data before actually running out of disk space.
Most tar programs do prevent extracting tarballs containing absolute paths (like '/etc/passwd') and relative paths (like '../../../etc/passwd'), but older tar programs still allow that. And programs written in Go, because of course: https://github.com/golang/go/issues/55356
Overall, if your HDD size is infinite and you're using GNU tar, or another recent tar, you can skip 't' I think before doing a '-C' extraction into some safe directory.
The tar file format doesn't prevent you from specifying absolute paths in the archive. It's up to the tool extracting the archive to reject/ignore such paths.
I asked about options for GNU tar because there is a bit of strange behavior.
To add absolute paths to an archive, there is "-P" option, and man says it works only for creating archives: "Don't strip leading slashes from filenames when creating archives".
To extract absolute paths from the archive, you need to add the "-C /" option, and although the tool says "tar: Strip leading `/' from member names", it will still extract it in the right place because the paths become relative and -C puts them in the root.
However, if you add "-P" during the extraction (which is not mentioned in man), the "strip leading slashes" information disappears.
So if this message bothers someone, "tar -C / -xPf file.tar" will cleanly extract absolute paths from the archive ;)
So anything that will write an absolute path there, including literally opening it in a text editor and replacing the path by hand because that whole header is just fixed length ASCII with null terminated strings.
(I mean I assume the tar(1) command can do it too but you don't need that, the format is dead simple, if weird.)
It's a good exercise to open one of these files in hexdump or something to get a feel for what's really going on inside... but yeah, GNU tar has -P / --absolute-names to just create them with leading slashes.
Well... Yes you're not wrong. But this isn't really any different from downloading an installer and entering your password to permit the install to proceed.
So I assume that you also never do that - and the downloaded installer is even worse, because you can't easily look inside the executable to determine what it does.
Shocking news : Installing software installs the software.
Maintaining that kind of monstruosity of shell scripts that have to work on all kind of OS or linux distros must be a giant PITA compared to the simplicity of making an appimage/flatpak and a set of deb/rpm packages for the most popular distros and clear instructions for port maintainers to do the same for archlinux and BSD systems.
One should never pipe curl output into `sudo bash`, but I think you've got things quite backwards here. Barring some extreme edge cases, putting binaries in /usr/bin, icons in /usr/share/icons, config files in /etc, libraries in /usr/lib and so on, are standard across the the Linux workd, and simple utilities that are just archives of binaries, docs, and auxiliary files can easily be deployed to most distros with a common install script.
This is far, far less complex than having to maintain and distribute an entire bundled runtime environment, deal with inconsistent behavior with the local config, etc. Flatpak and AppImage have their use cases, but by no means are they simpler than just putting binaries in the right places -- they are in fact an entire additional layer of complexity.
That's a reasonable practice, yes, if you're doing a manual direct install. In situations like this, though, I typically just write a quick PKGBUILD script so the package manager can manage it, which means pointing things at the direct /usr/* paths, not /usr/local/*.
I don't recall ever seeing this approach with sudo. Is it really becoming popular?
It used to be just bash, e.g. `curl ... | bash`. But now there is nothing stopping you from putting sudo in that bash script and expecting with a high probability that NOPASSWD: is also there ;)
It’s funny how tar came out first. Then cpio came out which was a lot better but tar had the momentum and it never lost it. I still find cpio more controllable and use that as a preference.
My WOW with cpio is pretty elementary. I've never found anything in all my usage to shudder about. Possibly there are smarty pants usages in there that could make you shudder I don't know. But why choose smarty pants if elementary is perfectly satisfactory.
I never understood why so few people use `pax` instead of `tar`. It has always seemed more user friendly to begin with.
With pax, the `-k` option will not overwrite existing files. Also not using `-p o` or `-p e` will not preserve rights. I don't know why you would extract and preserve rights to extract binaries in a chroot, usually you would want to control the permissions instead.
Now that is funny, a few days ago I was downvoted to oblivion for stating the obvious: that nobody cares about POSIX anymore. And now you are telling me that all those downvoters are probably not following POSIX systems[1], either on their own desktop or their servers.
[1] pax is the recommended archive utility in the POSIX shell&utilities section while tar and cpio aren't mentionned.
Because tar is super versatile, old, and as the name implies, creates an archive. Useful for creating an archive to image something, and many times you want reproducible ownership and permissions.
Tar can do all the same things, it just requires options either on the archiving or extraction end.
Then, this explains why. BTW, this is an expected Shell behavior, specified in the POSIX standard:
If a filename begins with a <period> ( '.' ), the <period> shall be explicitly
matched by using a <period> as the first character of the pattern or immediately
following a <slash> character. The leading <period> shall not be matched by:
* The <asterisk> or <question-mark> special characters
GNU tar was included as the standard system tar in
FreeBSD beginning with FreeBSD 1.0.
This is a complete re-implementation based
on the libarchive(3) library. It was first released with
FreeBSD 5.4 in May, 2005.
My personal journey with FreeBSD began with version 5.3 (the first stable release on the 5th branch) in November 2004.
I was completely unaware of such a significant tar change, and apparently I didn't care at the time. However, the entire 5th branch has been so revolutionary compared to the 4th branch that this change is just a drop in the bucket ;)