Questions tagged [datastage]

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool.

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool. Data Sources/Targets could be database tables, flat files, datasets, csv files etc. Basic design paradigm consists of a unit of work called as DataStage job. Multiple jobs can be controlled and conditionally sequenced using 'Sequences'.

IBM® InfoSphere® DataStage® integrates data across multiple systems using a high performance parallel framework, and it supports extended metadata management and enterprise connectivity. The scalable platform provides more flexible integration of all types of data, including big data at rest (Hadoop-based) or in motion (stream-based), on distributed and mainframe platforms.

Read more here

InfoSphere DataStage provides these features and benefits:

  • Powerful, scalable ETL platform
  • Support for big data and Hadoop
  • Near real-time data integration
  • Workload and business rules management
  • Ease of use

Support for big data and Hadoop

  • Includes support for IBM InfoSphere BigInsights, Cloudera, Apache and Hortonworks Hadoop Distributed File System (HDFS).
  • Offers Balanced Optimization for Hadoop capabilities to push processing to the data and improve efficiency.
  • Supports big-data governance including features such as impact analysis and data lineage

Powerful, scalable ETL platform

  • Manages data arriving in near real-time as well as data received on a periodic or scheduled basis.

  • Provides high-performance processing of very large data volumes.

  • Leverages the parallel processing capabilities of multiprocessor hardware platforms to help you manage growing data volumes and shrinking batch windows.

  • Supports heterogeneous data sources and targets in a single job including text files, XML, ERP systems, most databases (including partitioned databases), web services, and business intelligence tools.

Near real-time data integration

  • Captures messages from Message Oriented Middleware (MOM) queues using Java Message Services (JMS) or WebSphere MQ adapters, allowing you to combine data into conforming operational and historical analysis perspectives.

  • Provides a service-oriented architecture (SOA) for publishing data integration logic as shared services that can be reused over the enterprise.

  • Can simultaneously support high-speed, high reliability requirements of transactional processing and the large volume bulk data requirements of batch processing.

Ease of use

  • Includes an operations console and interactive debugger for parallel jobs to help you enhance productivity and accelerate problem resolution.

  • Helps reduce the development and maintenance cycle for data integration projects by simplifying administration and maximizing development resources.

  • Offers operational intelligence capabilities, smart management of metadata and metadata imports, and parallel debugging capabilities to help enhance productivity when working with partitioned data.

507 questions
6
votes
3 answers

Is there a way to use User Activity Variables to store SQL in Datastage

I am considering using RCP to run a generic datastage job, but the initial SQL changes each time it's called. Is there a process in which I can use a User Activity Variable to inject SQL from a text file or something so I can use the same…
arcee123
  • 539
  • 3
  • 32
  • 86
6
votes
3 answers

Copying 6000 tables and data from sqlserver to oracle ==> fastest method?

i need to copy the tables and data (about 5 yrs data, 6200 tables) stored in sqlserver, i am using datastage and odbc connection to connect and datstage automatically creates the table with data, but its taking 2-3 hours per table as tables are very…
user218903
  • 251
  • 1
  • 3
  • 9
4
votes
2 answers

How to check if value in field is decimal and not string (DATASTAGE)?

How to check if value in field is decimal and not string (DATASTAGE) ? I am using with Datastage Version 8.
Esti Gong
  • 41
  • 1
  • 2
4
votes
1 answer

'SQL1024N A database connection does not exist. SQLSTATE=08003' error while executing 'db2 -x' command through command stage in Datastage 9.1

I'm getting 'SQL1024N A database connection does not exist. SQLSTATE=08003' error, while executing 'db2 -x' command through execute command stage in Datastage 9.1(AIX Server). Can any one please help me out ?
3
votes
1 answer

How to pass output from a Datastage Parallel job to input as another job?

My requirement is Parallel Job1 --I extract data from a table, when row count is more than 0 Parallel job 2 should be triggered in the sequencer only when the row count from source query in Job1 is greater than 0 I want to achieve this without…
3
votes
1 answer

How to remove leading zeros in DataStage

I am trying to remove the leading zeros in the decimal field in a sequential file stage. what is the solution for this problem ?
Ley Gail
  • 55
  • 1
  • 1
  • 4
3
votes
2 answers

restore all keyspaces and tables from cassandra data folder

I have all keyspaces and tables copied from another cassandara data folder ,How can I restore it in my cassandara node. I dont have snapshots which are normally required to restore.
Vikas Kumar
  • 469
  • 5
  • 16
3
votes
2 answers

Fibonacci Sequence using Datastage

I'm trying to get an output of Fibonacci sequence in Datastage. I am trying it with a row generator-->Transformer-->Sequential File. My data inside row generator is (0 and 1). I have no idea what to put in my transformer. Data:0,1 The output should…
3
votes
2 answers

Extracting DataStage job performance stats (start and finish times)

DataStage version is 8.1 - I have no direct access but need to give instructions to extract some job runtime stats for me. I believe that repository is in DB2 database or maybe in flat files if that's still supported in 8.1. I can't install any…
Alex
  • 31
  • 1
  • 3
2
votes
1 answer

Is it possible to set up IBM data stage through Docker?

I want to run IBM Infosphere DataStage locally. Can I set it up through Docker? Are there any alternatives, please suggest.
Rakesh
  • 1,130
  • 1
  • 12
  • 17
2
votes
1 answer

How to alter session before running sql through datastage job

I need to alter session before executing the main sql in oracle connector used in my datastage job. I tried altering the session from before sql tab as below.But seems this is not working alter session set star_transformation_enabled=TRUE; When I…
2
votes
1 answer

Steps to generate OAuth2 token in hierarchical stage in Datastage

How to generate OAuth2 token in hierarchical stage in Datastage job. What are the steps to do this?
2
votes
1 answer

Reverse engineering DataStage code into Pig (for Hadoop)

I have a landscape of datastage applications which I want to reverse engineer into Pig... Rather than having to write fresh Pig code and try to replicate the datastage functionality. Has anyone had experience of doing something similar? Any tips on…
Steve
  • 21
  • 2
2
votes
1 answer

SQLState 02000 No row was found for FETCH, UPDATE, or DELETE

I'm running jobs through Datastage with the DELETE then INSERT connector. I'm having several jobs failing for this error: DB2_Connector: DB2 reported: SQLSTATE = 02000 Native Error Code = 100, Msg = IBM[CLIDriver][DB2/NT64] SQL01000W No row was…
Chris
  • 83
  • 1
  • 1
  • 10
2
votes
2 answers

IBM datastage integration with java

We have datastage jobs and want to use one java class which reads the file and gives some data back. Can someone explain the steps needed to perform this function?
user509755
  • 2,731
  • 9
  • 41
  • 72
1
2 3
33 34