18

How can I download a PDF and store to disk using vb.NET or C#?

The URL (of the PDF) has some rediection going on before the final PDF is reached.

I tried the below but the PDF seems corrupted when I attempt to open locally,

Dim PdfFile As FileStream = File.OpenWrite(saveTo)
Dim PdfStream As MemoryStream = GetFileStream(pdfURL)
PdfStream.WriteTo(PdfFile)
PdfStream.Flush()
PdfStream.Close()
PdfFile.Flush()
PdfFile.Close()
Bugs
  • 4,356
  • 9
  • 30
  • 39
Perplexed
  • 847
  • 2
  • 17
  • 31
  • Can you show the GetFileStream() function as well? – Joel Coehoorn May 26 '10 at 14:30
  • There's no need to call Flush() if you're going to call Close(). – Eric Mickelsen May 26 '10 at 14:31
  • Is the pdf itself corrupt? The code I posted works for the IRS provided pdf. Other setup I can think of is to use the WebClients DownloadData method and writing the byte[] to disk then. – Pat May 26 '10 at 15:15
  • Well, the pdf size is 4kb so it's definetely not downloaded properly. – Perplexed May 26 '10 at 15:25
  • Ok, the problem seems to be the redirection. If I hard code the end url (the url of the pdf after redirection completes) then it works (using downloaddata method). Trouble is how do I ensure that the document is downloaded after redirection completes?? – Perplexed May 26 '10 at 15:48
  • The initial url is: http://www.blahblah.com/NLAAPI.dll/GetObject?ObjectID=34972180 And after a second or two it redirects to: http://www.blahblah.com/PDFs/BRABB39.PDF – Perplexed May 26 '10 at 16:01
  • If its a script based redirect you will probably have to do some html parsing or something to extract the location its redirecting to. – Pat May 26 '10 at 19:37
  • Ok, so I looked at the contents of the requested url and it is basically a login page. When I browse the link with my broswer I do not get prompted for login because of a cookie, but I guess requesting the page through this code does not take into account the cookies on my machine. So... Is there a way to attach the cookie to webclient request? Or Can I (after posting the crdentials programmtically) then somehow get the resulting HttpWebResponse saved to a file?? Many thanks. – Perplexed May 27 '10 at 11:38
  • Using CookieContainer with WebClient class: http://stackoverflow.com/questions/1777221/using-cookiecontainer-with-webclient-class and apparently the WebClient class follows redirections: http://www.eggheadcafe.com/tutorials/csharp/70511872-c3aa-4e92-a7d7-dd4b09881af5/make-the-webclient-class-follow-redirects-and-get-target-url.aspx although I couldn't find that in the MSDN documentation. – Andrew Morton May 18 '12 at 16:57

3 Answers3

22

You can try to use the WebClient (System.Net namespace) class to do this which will avoid any stream work on your side.

The following C# code grabs an IRS form and saves it to C:\Temp.pdf.

using(WebClient client = new WebClient())
{
    client.DownloadFile("http://www.irs.gov/pub/irs-pdf/fw4.pdf", @"C:\Temp.pdf");
}
Pat
  • 5,220
  • 1
  • 36
  • 53
  • 3
    Tried that, opening pdf I get.... "Adobe Reader could not open file because it is either not a supported file type or because the file has been damaged blah blah" – Perplexed May 26 '10 at 14:56
  • GetFileStream function: Protected Function GetFileStream(ByVal URL As String) As MemoryStream Dim _url As String = URL Dim _wb As WebClient = New WebClient Dim myBuffer() As Byte Dim _str As MemoryStream = Nothing Try myBuffer = _wb.DownloadData(_url) _str = New MemoryStream(myBuffer) Catch ex As Exception _str = Nothing End Try Return _str End Function – Perplexed May 26 '10 at 14:57
  • 1
    This can work. and make sure u set security permission to IISUSER if you are putting ur code for web page. – Mah Jin Khai Aug 16 '16 at 07:09
7

You can also try the following code sample to download pdf files

 Response.ContentType = "Application/pdf"; 
 Response.AppendHeader("Content-Disposition", "attachment; filename=Test_PDF.pdf"); 
 Response.TransmitFile(Server.MapPath("~/Files/Test_PDF.pdf")); 
 Response.End(); 
Raaghav
  • 2,730
  • 1
  • 21
  • 21
0

How might you be able to use client.downloadfile when the URL is pointing a "showdocument.aspx" page.

Example: https://gaming.nv.gov/modules/showdocument.aspx?documentid=246

MarkRLV
  • 5
  • 4