Task Factory Hadoop WebHDFS

Important:  Users who are able to successfully test their connection yet receive an unable to connect error at runtime, please direct your attention to the following help document as you may need to update your local hosts file. 

Hadoop WebHDF connection manager is available for SQL versions 2012 and higher.

Connection Manager

Hadoop WebHDFS Connection Manager

Used with Hadoop WebHDFS Source.

Task Factory Hadoop WebHDFS Connection Manager

OptionDescription
WebHDFS Server AddressThe fully qualified web address and port number where the HDFS (Hadoop Distributed File System) is located (example:http://192.168.1.10:50070).
UsernameThe username with permission to access HDFS files.
Hadoop WebHDFS Source

Hadoop WebHDFS Source

Source IconSource Description
Task Factory Hadoop WebHDFS Source IconThe Hadoop WebHDFS Source is used to stream large files stored in the HDFS of a Hadoop server which can be converted into rows of data within SSIS. Currently, the Hadoop WebHDFS Source only supports text and CSV files. See the Hadoop WebHDFS Connection Manager to learn more about setting up the connection manager.

Task Factory Hadoop WebHDFS Source

OptionDescription
File NameThe filename (if in the root directory) or path to the files stored within HDFS (example: FolderName/DataFile.txt).
Options
  • Data Contains Headers? - Similar to the native Flat File Source, this selection identifies the first row as containing column headers.
  • Row Delimiter - Identifies a character or carriage return (\n) to signify a new row.
  • Column Delimiter - Identifies the character used to separate values for the different columns such as a comma.
  • Text Qualifier - Identifies the character used to wrap values such as quotation marks.
Output Columns
Users can create, remove, and configure the name, index (zero-based), data type, length, precision, and scale of the columns being extracted from the text file.