使用wget来递归地获取一个包含任意文件的目录

Question

更多

资料来源非AMP版本编辑

使用wget来递归地获取一个包含任意文件的目录

我有一个网络目录，在那里我存储了一些配置文件。我想用wget把这些文件拉下来，并保持它们当前的结构。例如，远程目录看起来像。

http://mysite.com/configs/.vim/

.vim持有多个文件和目录。我想用wget在客户端复制这些文件。似乎找不到合适的wget标志组合来完成这个任务。有什么想法吗？

Milan Babuškov

已编辑的问题 7日十一月 2008 в 10:22

编程

shell wget

7日十一月 2008 в 9:44

37 种观点

Sriram

资料来源非AMP版本编辑

递归下载一个目录，它拒绝index.html*文件，下载时不需要主机名、父目录和整个目录结构。

wget -r -nH --cut-dirs=2 --no-parent --reject="index.html*" http://mysite.com/dir1/dir2/data

116

0

Sean Villani

资料来源非AMP版本编辑

对于其他有类似问题的人。 Wget跟随robots.txt可能会让你无法抓取网站。不用担心，你可以把它关掉。

wget -e robots=off http://www.example.com/

http://www.gnu.org/software/wget/manual/html_node/Robot-Exclusion.html

ma11hew28

编辑本段答案6日五月 2014 в 4:00

112

0

SamGoody

资料来源非AMP版本编辑

你应该使用 -m (mirror) 标志，因为它可以避免时间戳的混乱，并且可以无限期地递归。

wget -m http://example.com/configs/.vim/

如果再加上其他人在这个帖子里提到的几点，那就是。

wget -m -e robots=off --no-parent http://example.com/configs/.vim/

36

0

Erich Eichinger

资料来源非AMP版本编辑

下面是完整的wget命令，它能让我从服务器'的目录中下载文件（忽略robots.txt）。

wget -e robots=off --cut-dirs=3 --user-agent=Mozilla/5.0 --reject="index.html*" --no-parent --recursive --relative --level=1 --no-directories http://www.example.com/archive/example/5.3.0/

esote

编辑本段答案19日三月 2017 в 11:11

31

0

资料来源非AMP版本编辑

如果 "no-parent "没有帮助，你可以使用 "include "选项。

目录结构：{{{5401828}}。

http:///downloads/good
http:///downloads/bad

而你想下载downloads/good而不是downloads/bad目录。

wget --include downloads/good --mirror --execute robots=off --no-host-directories --cut-dirs=1 --reject="index.html*" --continue http:///downloads/good

7

0

Conor McDermottroe

资料来源非AMP版本编辑

wget -r http://mysite.com/configs/.vim/

对我来说是有效的。

也许你有一个.wgetrc，对它造成了干扰？

5

0

prayagupd

资料来源非AMP版本编辑

要用用户名和密码递归地获取一个目录，请使用以下命令。

wget -r --user=(put username here) --password='(put password here)' --no-parent http://example.com/

RomSteady

编辑本段答案8日五月 2016 в 12:35

5

0

Jordan Gee

资料来源非AMP版本编辑

你只需要两个标志，一个是"-r"用于递归，另一个是"--no-parent"(或-np)，以便不在'.'和"."中去。像这样。

wget -r --no-pare http://example.com/configs/.vim/。

就是这样。它将下载到以下本地树中。 ./example.com/configs/.vim 。然而，如果你不想要前两个目录，那么使用额外的标志--cut-dirs=2，就像前面的回复中建议的那样。

wget -r --no-pare --cut-dirs=2 http://example.com/configs/.vim/。

而且它只会把你的文件树下载到./.vim/中。

事实上，我从这个答案的第一行正是来自 [wget 手册][1]，他们在第 4.3 节的最后有一个非常简洁的例子。

[1]: https://www.gnu.org/software/wget/manual/wget.html#Directory_002dBased-Limits

Jordan Gee

编辑本段答案9日四月 2018 в 6:02

2

0

devon

资料来源非AMP版本编辑

Wget 1.18可能会更好用，例如，我被1.12版本的bug咬到了。

wget --recursive (...)

...只检索index.html而不是所有文件。

解决办法是注意到一些301重定向，并尝试新的位置 - 给予新的URL，wget得到了目录中的所有文件。

zb226

编辑本段答案23日八月 2017 в 12:47

1

0

rkok

资料来源非AMP版本编辑

此版本以递归方式下载，不创建父目录。

wgetod() {
    NSLASH="$(echo "$1" | perl -pe 's|.*://[^/]+(.*?)/?$|\1|' | grep -o / | wc -l)"
    NCUT=$((NSLASH > 0 ? NSLASH-1 : 0))
    wget -r -nH --user-agent=Mozilla/5.0 --cut-dirs=$NCUT --no-parent --reject="index.html*" "$1"
}

用法。

添加到~/.bashrc或粘贴到终端。
wgetod "http://example.com/x/"。

1

0

kasperjj

资料来源非AMP版本编辑

你应该能够简单地通过添加一个-r

wget -r http://stackoverflow.com/

1

0

pr-pal

资料来源非AMP版本编辑

下面的选项似乎是处理递归下载时的完美组合。

wget -nd -np -P /dest/dir --recursive http://url/dir1/dir2。

为了方便起见，从man页中摘录了相关片段。

   -nd
   --no-directories
       Do not create a hierarchy of directories when retrieving recursively.  With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the
       filenames will get extensions .n).

   -np
   --no-parent
       Do not ever ascend to the parent directory when retrieving recursively.  This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.

0

Jeremy Ruten · Accepted Answer · 2008-11-07T21:55:41+00:00

解决办法

Jeremy Ruten

资料来源非AMP版本编辑

你必须向wget传递-np/-no-parent选项（当然，除了-r/-recursive之外），否则它将跟随我网站上的目录索引链接到父目录。因此，该命令看起来像这样。

wget --recursive --no-parent http://example.com/configs/.vim/

为了避免下载自动生成的index.html文件，请使用-R/-拒绝选项。

wget -r -np -R "index.html*" http://example.com/configs/.vim/

waldyrious

编辑本段答案4日十月 2017 в 9:53

913

0