In my last blog post, we discussed the problem of a folder data source and not knowing the source file name was that each row of data in the map came from.
In that post, the source file type was a text file, so it wasn’t terribly difficult to use .NET to read the folder file names and then modify the file to add that file name as a new column.
But what if our Folder Data Source was using an Excel file?
Some of what we learned from that previous article still applies, the code to find the files in that source folder is the same.
And if we leverage a bit of code from another of my previous posts, we had adjusted the output file column header on an Excel file destination.
With those two bits of previous code, I thought this was going to be an easy project. Well it ended up being a bit tricky with a couple of mistakes – writing/debugging this was more challenging than what I was shooting for.
And if we were going to use an Excel File Folder Data Source, I thought that we’d make it a bit more interesting to dynamically find the source folder from the SmartConnect tables as well as which sheet name it should use instead of hardcoding the paths as I did previously.
As we started off with the Customers.xlsx previously, we will again. But this time, we want a new FileName column that our customization code will populate. For that, we will manually edit the file and add the new column and save the new Template file.
Figure 1: FileName added manually to the Template
Our source files will not have this column (well unless you can modify the source generation routine to include this information – then you won’t need this script!) and will look the same but without the new column initially.
Figure 2:Sample source file without FileName column
If we think about the steps we need to complete, we would need:
- Run a SQL query to determine the source folder for this map (or we can cheat and just hard code it in the script)
- Run a SQL query to determine the Sheet Name that this folder data source map is using (or just hard code)
- Write a new with a loop to find all the xlsx files in the source folder
- Open each of those same Excel files somehow, see if the FileName column exists, and if not then add the column and populate each row.
To determine the source folder, we can query the FolderDataSource table.
Figure 3: Get the Source Folder path
In our script, we also tacked on the trailing back slash, we could have done it here or the ADD_COLUMNS_TO_SOURCE script but I thought it best to get the entire path with trailing slash here. Also notice that the results of this script are put into our custom global variable GBL_SOURCE_FOLDER.
The script to get the Sheet Name is similar and also put into a customer global variable – GBL_SHEET_NAME.
Figure 4: Get the Sheet Name selected on the folder data source
The SQL for this one was a bit trickier because the sheet name in the table is stored with the trailing $ at the end – such as Sheet1$. When using COM automation of Excel as we are going to do in this article, we only want the actual name as seen in Excel – Sheet1. As before, we could have easily removed it with .NET code in the ADD_COLUMNS_TO_SOURCE but thought it would be more efficient to take care of it on the SQL query right away by removing the last character. However that was a bit trickier than expected because the QueryText field is a ‘text’ field that that doesn’t support the Substring() function in SQL. So it had to be converted to a varchar first and then we could use the Substring() function.
Now that we have the source folder and sheet, we have the hard part left to do. The code to find the source files in that source folder, open them, and then modify them with the file name and finally re-save the data.
Figure 5: Where the magic happens
Some highlights of this code section include:
- Uses the global variables retrieved via the SQL Tasks
- Instantiates a COM interface to Excel. With .NET, using the .NET assemblies for Excel is preferred however this would entail using a new Script Namespace which affects ALL maps, not just this one. Because of that, using COM vs .NET interop is safer
- Uses System.File.IO to run a DirectoryInfo query to find all xlsx files in the source folder
- For each file found, loop through each rows and writing the current file name in a new FileName column that we would create. When finished, re-save the document.
- When finished with these pre-map tasks, the normal processing of the map would take over which would now include the new FileName column correctly populated.
Interesting things that we learned in this map
- Using COM automation on Excel to read/write the data
- How to find the source folder and sheet name by querying the SmartConnect database
You can download the map, the source files, as well as text files with copies of the scripts HERE.
Best Regards,
Patrick Roth
eOne