11

I'm trying to do a dead simple thing: to change files encoding from anything to UTF-8 without BOM. I found several scripts that do this and the only one that really worked for me is this one: https://superuser.com/questions/397890/convert-text-files-recursively-to-utf-8-in-powershell#answer-397915.

It worked as expected, but I need the generated files without BOM. So I tried to modify the script a little, adding the solution given to this question: Using PowerShell to write a file in UTF-8 without the BOM

This is my final script:

foreach ($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }

    $dest = $i.Fullname.Replace($PWD, "some_folder")

    $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)

    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }

    get-content $i | out-file -encoding $Utf8NoBomEncoding -filepath $dest
}

The problem is that powershell is returning me an error, regarding the System.Text.UTF8Encoding($False) line, complaining about an incorrect parameter:

It is not possible to validate the argument on the 'Encoding' parameter. The argument "System.Text.UTF8Encoding" dont belongs to the the group "unicode, utf7, utf8, utf32, ascii" specified by the ValidateSet attribute.

I wonder if I'm missing something, like powershell version or something like that. I never coded a Powershell script before, so I'm totally lost with this. And I need to change these files encoding, there are hundreds of them, I wouldn't like to do it myself one by one.

Actually I'm using the 2.0 version that comes with Windows 7.

Thanks in advance!

EDIT 1

I tried the following code, suggested by @LarsTruijens and other posts:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach ($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }

    $dest = $i.Fullname.Replace($PWD, "some_folder")

    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }

    $content = get-content $i
    [System.IO.File]::WriteAllLines($dest, $content, $Utf8NoBomEncoding)
}

This gives me an Exception, complaining about one of the parameters for WriteAllLines: "Exception on calling 'WriteAllLines' with 3 arguments. The value can't be null". Parameter name: contents. The script creates all folders, though. But they are all empty.

EDIT 2

An interesting thing about this error is that the "content" parameter is not null. If I output the value of the $content variable (using Write-host) the lines are there. So why it becomes null when passed to WriteAllLines method?

EDIT 3

I've added a content check to the variable, so the script now looks like this:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach ($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }

    $dest = $i.Fullname.Replace($PWD, "some_folder")

    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }

    $content = get-content $i

    if ( $content -ne $null ) {

        [System.IO.File]::WriteAllLines($dest, $content, $Utf8NoBomEncoding)
    }
    else {
        Write-Host "No content from: $i"
    }
}

Now every iteration returns "No content from: $i" message, but the file isn't empty. There is one more error: Get-content: can't find the path 'C:\root\FILENAME.php' because it doesn't exists. It seems that it is trying to find the files at the root directory and not in the subfolders. It appears to be able to get the filename from child folders, but tries to read it from root.

EDIT 4 - Final Working Version

After some struggling and following the advices I got here, specially from @LarsTruijens and @AnsgarWiechers, I finally made it. I had to change the way I was getting the directory from $PWD and set some fixed names for the folders. After that, it worked perfectly.

Here it goes, for anyone who might be interested:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
$source = "path"
$destination = "some_folder"

foreach ($i in Get-ChildItem -Recurse -Force) {
    if ($i.PSIsContainer) {
        continue
    }

    $path = $i.DirectoryName -replace $source, $destination
    $name = $i.Fullname -replace $source, $destination

    if ( !(Test-Path $path) ) {
        New-Item -Path $path -ItemType directory
    }

    $content = get-content $i.Fullname

    if ( $content -ne $null ) {

        [System.IO.File]::WriteAllLines($name, $content, $Utf8NoBomEncoding)
    } else {
        Write-Host "No content from: $i"   
    }
}
darksoulsong
  • 9,541
  • 11
  • 36
  • 70
  • possible duplicate of [Using PowerShell to write a file in UTF-8 without the BOM](http://stackoverflow.com/questions/5596982/using-powershell-to-write-a-file-in-utf-8-without-the-bom). What made you believe `Out-File` and `[System.IO.File]::WriteAllLines()` were the same? They're not. – Ansgar Wiechers Sep 08 '13 at 16:25
  • @AnsgarWiechers About this link you pasted, I tried that and it did not worked for me. Like I said, I'm not a PowerShell programmer and don't know anything about it. So I don't know the difference between these two methods/functions and that's why I'm looking for specialized help. – darksoulsong Sep 09 '13 at 21:54
  • I tested the code from that other answer, and it does exactly what you said you want. Please update your question with the code where you used `[System.IO.File]::WriteAllLines()`. – Ansgar Wiechers Sep 09 '13 at 22:06
  • @AnsgarWiechers I've updated the question! – darksoulsong Sep 09 '13 at 23:08
  • Please double-check if `$content` really is not `$null` for **all** iterations. Try something like this: `if ($content -ne $null) { [System.IO.File]::WriteAllLines($dest, $content, $Utf8NoBomEncoding) } else { Write-Host "no content from $i" }`. – Ansgar Wiechers Sep 10 '13 at 09:37
  • Thnx, @AnsgarWiechers The question is updated with new info after the changes you suggested. – darksoulsong Sep 10 '13 at 10:06
  • `$i` is expanded to just the filename without the path. Try `Get-Content $i.FullName`. – Ansgar Wiechers Sep 10 '13 at 11:43
  • I believe you won't get correct result as you read files content with default encoding: $content = get-content $i.Fullname get-content by default uses Ascii encoding, so if a file is already is in utf8 the script will break encoding. – Shrike May 20 '14 at 10:36

7 Answers7

3

You didn't follow the whole answer in here. You forgot the WriteAllLines part.

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach ($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }

    $dest = $i.Fullname.Replace($PWD, "some_folder")

    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }

    $content = get-content $i 
    [System.IO.File]::WriteAllLines($dest, $content, $Utf8NoBomEncoding)
}
Community
  • 1
  • 1
Lars Truijens
  • 40,852
  • 6
  • 117
  • 137
  • 1
    Thx, but this gives me an error about one of the parameters for WriteAllLines: "Exception on calling 'WriteAllLines' with 3 arguments. The value can't be null". – darksoulsong Sep 09 '13 at 21:57
  • Please find out what parameter is null. Doesn't the error message tell you which one it is? Did you notice I moved $Utf8NoBomEncoding above the foreach? – Lars Truijens Sep 10 '13 at 08:41
  • oh I see in your question it is the contents parameter. $content = get-content $i should fill that variable. Please see if you have any errors there. – Lars Truijens Sep 10 '13 at 08:44
  • I updated the question in the second edit. When I go ahead and print the $contents variable (using WriteHost) I can see the lines of the file store there. Even though I get this exception when I try to pass it to WriteAllLines. Any other way to debug this? – darksoulsong Sep 10 '13 at 09:39
2

Half of the answer is in the error message. It tells you the possible values the Encoding parameter accepts, one of them is utf8.

... out-file -encoding utf8
Shay Levy
  • 107,077
  • 26
  • 168
  • 192
  • But if I add utf8 only the files are generated WITH BOM. I need them without it. – darksoulsong Sep 08 '13 at 14:47
  • Does this work? [byte[]$content = [io.file]::ReadAllBytes('file.txt'); $Utf8NoBomEncoding = (New-Object System.Text.UTF8Encoding -ArgumentList $false).GetString($content); [io.file]::WriteAllText('file.txt', $Utf8NoBomEncoding) – Shay Levy Sep 08 '13 at 15:17
0

I've made some fixes

  • Get-Childitem acts on $source
  • replace does not try to interpret $source as regex
  • some resolve-path
  • auto-help

and packaged everything into a cmdlet:

<#
    .SYNOPSIS
        Encode-Utf8

    .DESCRIPTION
        Re-Write all files in a folder in UTF-8

    .PARAMETER Source
        directory path to recursively scan for files

    .PARAMETER Destination
        directory path to write files to 
#>
[CmdletBinding(DefaultParameterSetName="Help")]
Param(
   [Parameter(Mandatory=$true, Position=0, ParameterSetName="Default")]
   [string]
   $Source,

   [Parameter(Mandatory=$true, Position=1, ParameterSetName="Default")]
   [string]
   $Destination,

  [Parameter(Mandatory=$false, Position=0, ParameterSetName="Help")]
   [switch]
   $Help   
)

if($PSCmdlet.ParameterSetName -eq 'Help'){
    Get-Help $MyInvocation.MyCommand.Definition -Detailed
    Exit
}

if($PSBoundParameters['Debug']){
    $DebugPreference = 'Continue'
}

$Source = Resolve-Path $Source

if (-not (Test-Path $Destination)) {
    New-Item -ItemType Directory -Path $Destination -Force | Out-Null
}
$Destination = Resolve-Path $Destination

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)

foreach ($i in Get-ChildItem $Source -Recurse -Force) {
    if ($i.PSIsContainer) {
        continue
    }

    $path = $i.DirectoryName.Replace($Source, $Destination)
    $name = $i.Fullname.Replace($Source, $Destination)

    if ( !(Test-Path $path) ) {
        New-Item -Path $path -ItemType directory
    }

    $content = get-content $i.Fullname

    if ( $content -ne $null ) {
        [System.IO.File]::WriteAllLines($name, $content, $Utf8NoBomEncoding)
    } else {
        Write-Host "No content from: $i"   
    }
}
Darcon
  • 46
  • 6
0

This approach creates the whole folder structure before copying the files into UTF-8 from the current directory . At the end we exchange parent directory names .

$destination = "..\DestinationFolder"
Remove-item $destination -Recurse -Force
robocopy $PWD $destination /e /xf *.*

foreach($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }
    $originalContent = $i.Fullname
    $dest = $i.Fullname.Replace($PWD, $destination)
    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }
    get-content $originalContent | out-file -encoding utf8 -filepath $dest
}
jcromanu
  • 652
  • 7
  • 20
0

I adapted few snipplets when I needed to UTF8 encode a massive amount of log-files.

Note! Should not be used with -recurse

write-host " "
$sourcePath = (get-location).path   # Use current folder as source.
# $sourcePath = "C:\Source-files"   # Use custom folder as source.
$destinationPath = (get-location).path + '\Out'   # Use "current folder\Out" as target.
# $destinationPath = "C:\UTF8-Encoded"   # Set custom target path

$cnt = 0

write-host "UTF8 convertsation from " $sourcePath " to " $destinationPath

if (!(Test-Path $destinationPath))

{
  write-host "(Note: target folder created!) "
  new-item -type directory -path $destinationPath -Force | Out-Null
}

Get-ChildItem -Path $sourcePath -Filter *.txt | ForEach-Object {
  $content = Get-Content $_.FullName
  Set-content (Join-Path -Path $destinationPath -ChildPath $_) -Encoding UTF8 -Value $content
  $cnt++
 }
write-host " "
write-host "Totally " $cnt " files converted!"
write-host " "
pause
-1

With:

 foreach ($i in Get-ChildItem -Path $source -Recurse -Force) {

Only Files in the Subfolder $source will be used.

-1
  1. Goto the Dir you want cd c:\MyDirectoryWithCrazyCharacterEncodingAndUnicode
  2. Fire this script away!

Copy and past the script in your Powershell windows

 foreach($FileNameInUnicodeOrWhatever in get-childitem)
 {
    $FileName = $FileNameInUnicodeOrWhatever.Name

    $TempFile = "$($FileNameInUnicodeOrWhatever.Name).ASCII"

    get-content $FileNameInUnicodeOrWhatever | out-file $FileNameInUnicodeOrWhatever -Encoding ASCII 

    remove-item $FileNameInUnicodeOrWhatever

    rename-item $TempFile $FileNameInUnicodeOrWhatever

    write-output $FileNameInUnicodeOrWhatever "converted to ASCII ->" $TempFile
}
Transformer
  • 3,100
  • 17
  • 40