The Data Warehouse Pages

a Data Warehouse blog

lesson #02: forward ever, backward never

leave a comment »

Ok, I do Data Warehousing for financial institutions for a while, both in-line and as a consultant. I know that there are certain requirements outside, in the real world, that forces such companies to store histories of there data, and to be able to simulate the Truth as Was perspective. Regulatory requirements. They exist. Yes.

But that does not mean to build everything time dependent, to set up every master data with a capture of the slightest change within its attributes. No, in my past there was no, exactly zero, requirement for the simulation of Truth as Was. The business departments I used to work with were always interested in the as Is situation, and derive a proper forecast from this.

I don’t talk about time lines in your OLAP cube (which is a type of date attribute in your fact table). I talk about questions like: Show me the distribution of new contracts within the old field service structure! Who cares about that? You have a new structure, you have new revenues or applications. And if your KPI moves toward green things are right, and if not, one can not blame the new structure.

There is no need for the campaign management to know the ZIP code of last month. There is some sort of need to send the letter to the new postal address and use this micro demographic data.

Or am I totally wrong?

Written by Rayk Fenske

January 6, 2010 at 7:49 pm

Posted in Uncategorized

RSDMWB_OLD

with 2 comments

What the…? SAP BW is a strange, in this sense human system. Decades ago we migrated to SAP BW NetWeaver 2004s, which is known as BW7.0 since. Recently I wanted to delete a InfoObject, but wasn’t able due to its usage in a Data Mining model. Nothing easier than this: RSDMWB would help. Does not, no such model there to be found. Hmm? SDN helps and sugested the usage of RSDMWB_OLD, because pre 7.0 DM models could not be accessed using the new, but the old transaction. I entered the code, and the system dumped. I did the LOLITA (log out, log in, try again) approach…, and failed again. The OSS note was about to be created. Then heaven sent a hint. I usally work with German as logon language. “Try english” a suport team member told me. I did. I succesfully executed the transaction. I was able to delete the model, now I could continue with tiddying up the system.

Written by Rayk Fenske

January 5, 2010 at 3:24 pm

Posted in Meta, Tools

Tagged with , ,

lesson #01: trust the file

leave a comment »

I used to work with Informatica for several years, and the systems we usually received our data from where old HOST ones. So delivery took place in flat file form. Later one I moved “forward” to SAP BW in a SAP R/3 environment. Not 100% but most of the connected OLTP systems where R/3. And with R/3 comes the DataSource, both delivered by SAP as part of the Business Content (BC) and self created so called Generic Extractors. But there are several other options for getting data into a SAP based data warehouse:

  • Connect to a database
  • Receive SOAP messages
  • Use RFC function calls
  • Connect flat files (both server and client) .

So, we use a lot of the R/3 extractors, which, at the end, are real tight connections of OLTP sources and the data warehouse. What options do you have now? One could choose between a Delta of Full delivery of data. For Delta mode some continuously growing attribute must be specified. This is easy: One could start with a artificial ID, but will later move to some sort of changed-at time-stamp. The question now: Where are my deleted records? As it is not the aim of a OLTP system to store everything for ever things are deleted. The solution is a additional data source delivering information about deleted records, which are somehow “saved” during the deletion process. In Delta mode. One will not face this issue in case of a full extraction of data. You can do this ever, and ever again. No issue, as long as your operation is not blocked. But what kind of data you receive now? While extracting contract directly from OLTP, do I have all the contract holders and more important the corresponding version in time? No guarantee for this during direct access. Again, a solution is in place: Create buffer tables, which are filled at a certain moment in time, during a regular day end processing. Fill the buffer, ping the warehouse and load a full and consistent stock of data. You could do this again the same period, maybe day. The next day the buffer is overwritten. So, the chance to fix things up is gone. Another solution: Add technical driven tracking fields like status to the buffer table. Implement a handshake between OLTP and DWH. The new process looks like:

  1. End of day processing in OLTP
  2. Fill buffer tables, set status to NEW
  3. Ping the Warehouse
  4. Load all NEW data
  5. Perform checks
  6. Update buffer tables, set status to RECEIVED

Now, in case of error, status RECEIVED could be reset to NEW, the process starts again. For normal processing all records marked as received could be deleted after a certain period of time. And then there are gone. For ever.

It my seems anachronistic, but is there anything against the good old-fashioned flat file (Ok, nested are allowed too, you Cobol guys)? Have a day end processing, dump the necessary data into flat files, move them to the warehouse server, and load it, and reload it, and reload it a year after. The  advantages I see are:

  1. Consistent and congruent set of data from each source
  2. DWH processing does in no way affect the operation of OLTP system
  3. Backup and recovery is system inherent, just store your files on cheap tape
  4. Real delta information (Deleted records!) is accessible within two full stocks of data

I fully understand that deltas delivered by the sources, in a direct way are more easy to be handled in the beginning. But from my perspective I see the advantages in the long run of a interface. A flat file is there, and will not be changed until processed. It does not matter when this is done, there is a fair chance to reload all the data after a 3 day breakdown of your Data Warehouse. This is hard, and expensive to be implemented using direct connections. So: Trust the file! And give it a chance.

Written by Rayk Fenske

December 18, 2009 at 9:21 pm

Posted in General

Tagged with ,

lessons learned: there is a chance in every new beginning

with one comment

Starting next year I’ll move to a new job position. This wouldn’t be worth any mentioning if along with this does not comes me leaving IT departments after nearly 13 years of working with, in, and for Data Warehouses of different sizes and complexities across Europe. 7,5 years in line, and additional 5 years as BI consultant prior formed my way of handling both big and small, and both IT and social problems. I’m really looking forward the new challanges, but most of all the chance to shape a company at this level.

Before I dedicate myself to the implementation of a group wide Data Quality Management, fenced by a Metadata Management initiative, I would like to share some of the essentials I’ve learned the recent years. We’ll see what comes next, here at DWH Pages… Thanks in advance!

Written by Rayk Fenske

December 14, 2009 at 8:48 pm

Posted in General

Tagged with

The end of relational database management systems

leave a comment »

This short, and high-speed-to-read article at ACM offers a good list of arguments against the common relational DBMS world most of us are used to live in.

For the world of data warehousing the predicted solution is column based databases. At least databases, but not relational any more. But how do such column based solutions work? What is the linking element? IMHO there always needs to be a identifying key, and the associated value within the very column. All this optimized, compressed and stored in a bitmap like index. Additional you need information about how to join column information together. Do it with the original keys? Do it with sort of row ID pointing to the right line?Any other solutions? Ask google!

For me it all seems like old wine in new skins. At the end one need a high performing machine which does good hash calculation to build up and resolve indeces. In memory technology could spice up the query life. Seting up a data vault model is something in between, and I suppose a good way to have both a good and queryable database on one and good ad-hoc peformance on the other side.

A sugested further reading on column based databases is: Michael Stonebraker et al., “C-Store: A Column-oriented DBMS,” Proc 2005 VLDB Conference, Trondheim, Norway, Sept. 2005 (pdf).

Written by Rayk Fenske

August 26, 2009 at 9:09 pm

twitter? blogs? in companies?

leave a comment »

Recently I dived into the realm of twitter, and guess what, I’m still trapped there. On one hand it is fun to follow people and espacially friends on their actions, ideas, or hints, and keep them updated in a quick and unformal manner about what is happening im my life. But one the other hand I see some sort of potential for everyones professional life, and that is what I would like to talk about for now. At first I thought about a explanation about the usage of twitter on a personal level, but there are other sources doing this better, ask google. But, …

Establish a twitter server (would be nice to have this available, ‘am not sure about) within your company, into the DMZ if you like. Set up a wordpress server too. Imagine yourself as part of the BI team of your company. You may set up different blogs on Data Quality, Master Data Management, Enterprise Data Warehousing, or SOA activities. Create some twitter users, and link both the tools. Let your users from the business department follow various channels. It’s up to them to decide what if of interest to them.

Useful tweets could be about the general system status, or major malfunctions. More special tweets could be about the status of pre calculation engines, or the overall situation of your entrance layer / operational data store. Update your users about freshly released queries, add them as tiny URL into the tweet. What about KQI, your key quality indicators? And finaly you could enrich your internal news stream with external information. The intranet pages will be of certain use, but as news stream twitter could be more focused, user oriented.

Written by Rayk Fenske

August 19, 2009 at 7:57 pm

Posted in Meta, Tools

Tagged with , ,

query definition without BEx

leave a comment »

Via twitter.com/sapbwtweet and SDN I came across a not yet valued but evaluated tool to get a faster insight into your query definition without utilizing BEx and facing long waiting times.

  • Enter transaction se38
  • Execute programm RSRQ_QUERYDEFINITION
  • Enter a query name (known selection dialog from BEx)
  • Select / de-select some option
  • Execute
  • Analyze your query

In my opinion the structure of a query and all its elements are displayed in a transparent way without missing details. This transaction yould be used to analyze a query regarding filter values and the usage of variables much quicker than by see the real definition in BEx front end.

Written by Rayk Fenske

August 11, 2009 at 9:27 am

Posted in Reporting, Tools

Tagged with , ,

ny times on statistics

leave a comment »

Thanks to twitter I came across a interesting article at the online edition of the New York Times (here). It is about the problem of analyzing huge amounts of data and detect patterns within. The big companys a hiring more and more statisticians to develop numer crunching methods. The acquisition of SPSS by IBM outlines this tendency. Go for analysis, but do not forget reporting!

Written by Rayk Fenske

August 8, 2009 at 6:35 pm

Posted in General

Tagged with , ,

farewell scrum, welcome … what?

with 2 comments

Weeks ago we (which is my team and I) stopped what I labeled scrum before. The old postings about this experiment could be found here. For today I would like to reflect the latest stage and what comes next.

Last Scrum Board impression

Figure 1: Last Scrum Board impression

Figure 1 shows the last impression of our board. Most things are done with the release, somethings are left. In the upper left corner one could get an idea of the new usage of this expensive white board. I started nearly a year ago with the idea of utilizing different aspects of scrum, which were:

  • Daily stand-up meeting
  • Smaller iterations compared to our locale release cycle of 2 months
  • Tracking “things to be done” on a Board
  • Use a priorized Product Backlog owned by the Business Department (and me)
  • Review the release

My motivation behind was a) building up a team with a real spirit of cooperation and implementing solutions together and b) speeding up the development, short: Do more in less time. Things are working fine. IMHO we all performed much better in this time, we implemented more than originally planned per iteration, the dialogue between Business Department (BD) and me was as good as never before. The backlog was used by both parties to negotiate sprint content etc. But we failed. Why?

First of all I was not able to split “requests” from BD into such small pieces that a visual tracking was usefull. We commited us internally to a 3 week sprint duration. The common duration of a task was 5 days. Additional there is the need for solving incidents and thinks like. So, each developer is only able to implement 2 tasks per sprint. Given a team of 5 developers we had 10 story cards on board which remained for a long time in the “In development” slot and for a relativly short time in the “Test” slot. We met every day to talk about the one card. This was getting boring by the time. Then we experimented with a planing round. Which was a total flop. We only did once. What else? Nothing. What will change? A lot.

The board will still be used as so called information radiator, but with a different aproach: It is now there to sketch ideas of solutions, to visualize thougts, outlining architectures, data models, relations etc. I’ll move back to a weekly status meeting. This is than dedicated to talk about major issues, but also solutions, or topics from the other teams. Barriers are solved directly within the team. The Product Backlog will remain, because it is working too good. I’ll also try to have something new in production within every 3 weeks. Just to offer good service to BD. But I’ll cut the limit and will move towards a more continuous enhancement of the productive environment. I’ll do the release planing, which is necessary in the department, and also the estimation of the single entries in the backlog on my own. Assignment of tasks to single developers is also done directly by myself. So this is what changes most: The controling.

Main lesson learned: It is about the people! In this case about the team members, the developers. There is a need for a specific charateristic, some self motivation, the drive, to pick the next task on your own, to have enough confidence into your own decissions, to put commitments into your statements etc. Agile methods need specific people. Without them all the tools are nice, with them the tools produce added value.

Written by Rayk Fenske

July 31, 2009 at 12:52 pm

Posted in Agile Warehousing

Tagged with

TPS

leave a comment »

Thanks to a posting at codemonkeyism.com I bought Taiichi Ohnos book about the Toyota Production System. Originally published in 1978 it explains the way he changed the production system at Toyota into what is now known as The Toyota Way. Concepts like Just-In-Time (JIT), or kanban are explained.

With this book I try to clarify for myself whether Data Warehousing is production or product development (TPDS is the Toyota acronym for this). Am I part of the software development guild or not?

Written by Rayk Fenske

July 31, 2009 at 11:46 am

Posted in General

Tagged with ,