Thursday, September 17, 2020

Service Accounts suck - why data futures require end to end authentication.

 Can we all agree that "service" accounts suck from a security perspective.  Those are the accounts that you set up so what system/service can talk to another one.  Often this will be a database connection so the application uses one account (and thus one connection pool) to access the database.  These service accounts are sometimes unique to a service or application, but often its a standard service account for anything that needs to connect to a system.

The problem with that is that you've therefore got security defined at the service account level, not at all based on the users actually using it.  So if that database contains the personal information of every customer then you are relying on the application to ensure that they only display the information for a given customer, the security isn't with the data its with the application.

Back in 2003 a group called the "Jericho Forum" set up under Open Group to look at the infrastructural challenges of de-perimeterisation and they created a set of commandments the first of which is:
The scope and level of protection should be specific and appropriate to the asset at risk. 

Service accounts break this commandment as they take the most valuable asset (the data) and effectively remove security scope and place it in the application.   What needs to happen is that the original requestor of the information is authenticated at all levels, like with OAuth, so if I'm only allowed to see my data then if someone makes an error in the application code, or I run a Bobby Drop Tables attack, my "Select *" only returns my records.

This changes a lot of things, connection pooling for starters, but when you are looking at reporting in particular we have to get away from technologies that force systems accounts and therefore require multiple security models to be implemented within the consumption layer.

The appropriate level to protect data is at the data level and the scope is with the data only by shifting our perception of data from being about service accounts and databases to being about data being the asset can we start building security models that actually secure data as an asset.

Today most data technologies assume service accounts, this means that most data technologies don't think that data is an asset.  This has to change.

Thursday, August 27, 2020

Getting RocksDB working on Raspberry PI (Unsatisfied linker error when trying to run Kafka Streams)

 If you are here its probably because you've tried to get RocksDB working on a Raspberry PI and had the following exception:

Exception in thread "main-broker-b066f428-2e48-4d73-91cd-aab782bd9c4c-StreamThread-1" java.lang.UnsatisfiedLinkError: /tmp/librocksdbjni7453541812184957798.so: /tmp/librocksdbjni7453541812184957798.so: cannot open shared object file: No such file or directory (Possible cause: can't load IA 32 .so on a ARM platform)

at java.base/java.lang.ClassLoader$NativeLibrary.load0(Native Method)

at java.base/java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2452)


The reason for this is that your rocksdbjni jar file doesn't include the required shared library (.so file) for the ARM platform, it doesn't include x86 for Linux, a Windows DLL and a PPC(!) library but not for Linux so you are going to have to roll your own.

Step 1 - Doing the apt-gets

You have to do the build ON the Raspberry PI as its C code that needs to be compiled and the instructions for rocks aren't overly helpful as they assume a couple of things are already installed.
The following appear to be required:
sudo apt-get install cmake

sudo apt-get install vagrant

sudo apt-get update --fix-missing

sudo apt-get install vagrant


The vagrant part appears to be not actually required but I had issues when it wasn't installed so it is probably a dependency of the apt-get pulls in.


Step 2 - getting the code for RocksDB

Now you are going to have to download the code for RocksDB, I hope you've got a fairly large SD card as this will take up around a gig of space all told

Make a directory under your home directory (I call mine 'dev') and change into the directory (cd dev).

git clone https://github.com/facebook/rocksdb.git rocksdb


Then change directory into rocksdb.  If you are a Java developer you are possibly about to discover for the first time a tool that is why Java people created ant and maven because they REALLY hated how it worked...


Step 3 - running make


Then you want to run 

make rocksdbjavastaticrelease


Which will take a while but produce you a nice JNI jar file for your Raspberry PI that you can add to your classpath and you will be away.... the jar file will be under the "target" directory under rocksdb, so if you are on the default install with user "pi" it will be 

/home/pi/dev/rocksdb/java/target/rocksdbjni-6.12.0-linux32.jar


Copy that file into your classpath and rocks DB should now work fine


Step 4 - if running a Kafka Streams application

I'm sure there is a logical reason for this and that I could have created another work around, but having run the original and had the linker error when it accessed RocksDB there was a new error when running Kafka Streams:

Exception in thread "main-broker-00a490b4-b50a-4cc4-aded-d6324ad0f291-StreamThread-1" org.apache.kafka.streams.errors.ProcessorStateException: Error opening store reading-topic-STATE-STORE-0000000000 at location /tmp/kafka-streams/main-broker/0_0/rocksdb/reading-topic-STATE-STORE-0000000000

at org.apache.kafka.streams.state.internals.RocksDBTimestampedStore.openRocksDB(RocksDBTimestampedStore.java:87)

at org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:188)

at org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:224)

at org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)

at org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:42)


This is weird as it had never successfully run (as it couldn't create the DB), but for some reason there is state somewhere (I assume in the broker) that still exists for the app id, I assume because that runs before the linker.  So the fix there is to change "application.id" in the kafka configuration file to be something new and it should all run fine.

Wednesday, August 24, 2016

Why taking good holidays is good practice

Back when I was a fairly recent graduate I received one of the best pieces of advice I've ever received.  The project was having some delivery pressures and I was seen as crucial to one of the key parts.  As a result my manager was putting pressure on me to cancel my holiday (two weeks of Windsurfing bliss in the Med with friends) with a promise that the company would cover the costs.  I was called into the BIG boss' office with the full expectation that he would put the screws on further and in my head I had a list of additional demands.

"Steve, I've heard that XXX [name redacted] wants you to cancel your holiday"
"Yes, we've got to get the release out and I'm the one who knows most about resolving issues in the team"
"Don't"

This kind of stumped me so I sat there a bit quiet, and here came the best advice ever

"If you cancel this holiday then all anyone will remember is that they can make you cancel holidays, they won't actually appreciate it.  If you go on holiday and it all collapses all everyone will remember is that it collapsed without you, if it goes badly or much harder because you aren't there they will remember that, and if its OK without you then hopefully you have enough talent to not insist that the universe revolves around you.  Go on holiday, don't be contactable and remember that now more senior folks know about you because you didn't cancel your holiday."

Its a mantra I've lived by.  Take your holidays, plan to take them, and take them PROPERLY.  That means

  1. Change your password before leaving and DON'T update it on any mobile devices
  2. Your work laptop does NOT travel on holiday with you
  3. You don't do conference calls, standups, "just 30 minutes a day" or any other nonsense.
Since I've become a manager I think this is more important.  If I can't manage and succession plan to the level that the universe doesn't collapse while I'm away then what sort of manager am I?  I actually look very badly on folks who don't take holidays properly as they risking promoting a "macho hero" culture rather than a decent work/life balance culture.  If you don't recharge your batteries properly, spend proper time with your family then really, what is the point?

Planning for holidays is a sign of good planning, requiring people to cancel holidays is a sign of bad planning.  Requiring yourself to cancel holidays is a sign of extremely bad planning and bad succession management.

If you are so bad a manager that your team can't cope for 2 weeks without you, then you need to look at how you are developing your next level.  If you are so scared that people won't "miss" you when you are out of the office then you really need to check yourself and look at your career ambitions and direction.

Taking a good holiday (vacation for my American colleagues) is one of the key reasons WHY we work, explore the world, meet new people, do new things, RELAX. 

Back to the story at the top, I came back after the holiday to see issues lists and problems, it took 4 days of hard work to get us back to level and everyone saw me as the hero.  Had I cancelled my holiday none of the management team would have known how crucial I was to the team's success, because of the holiday I was rapidly promoted into a new role.  As a developer taking that holiday led directly to people appreciating much more what I did for the team.

These days a key success factor for my team is how they cope brilliantly when I'm not there, I'm not worried that this means I'm irrelevant, it means that as we grow and take on new challenges my leadership team is able to grow and develop and take those challenges on, leaving me free to work on what is next.

Good holidays are good practice.


Monday, August 01, 2016

The ten commandments of IT projects


And lo a new project did start and there was much wailing and gnashing of teeth, for up on the board had been nailed ten commandments that the project must follow and the developers were sore afraid.

  1. Thou shalt put everything in version control, yeah even the meeting minutes, presentations and "requirements documents that aren't even finished yet" for without control everything is chaos
  2. Thou shalt not break the build
  3. Thou shalt never deploy to a shared environment anything that is not under version control
  4. Thou shalt be honest about how long a task will take
  5. Thou shalt test both the happy and unhappy path
  6. Thou shalt automate everything that can be automated and do so early
  7. Thou shalt not re-invent the wheel "just because"
    • Most specifically thou art not Doug Lea, do not create your own threading library
      • If you are Doug Lea then just use your own threading library
  8. Assume Murphy's Law to be always true and never state "that is very unlikely to happen"
  9. Code unto others as you would have them code towards you
  10. Do not screw with the environment configurations without first verifying you are not messing things up for others
The punishment for breaking these commandments is to be relegated to the project management office and be responsible for the many and various Excel spreadsheets report that people outside of the project appear to hold sacred.  

-----

Now because I know that folks will look to interpret these things in many ways lets be clear on a few.

9. Code unto others as you would have them code towards you

This is important, you should write code that is designed to be understood by someone else and written in a way that if you came to it fresh you would be able to understand it.

6. Thou shalt automate everything that can be automated and do so early

This is important, too often people don't automate and more importantly don't automate early, trying to retrofit the automation once the project reaches a certain size.  You should really have everything automated before a single line of code is written, something that 'does nothing successfully'

5. Thou shalt test both the happy and the unhappy path 
8. Assume Murphy's Law is true

Too often I've heard the phrase "that won't happen" followed by "it happened in the first day in production".  This is why its really important to test the unhappy path and to clearly identify the bounds you are going to be operating within.  Hoping that something won't happen isn't a strategy.

Wednesday, January 28, 2015

Making DevOps Business Driven - a service view

I've been doing a bit recently around DevOps and what I've been seeing is that companies that having been scaling DevOps tend to run into a problem: exactly what is a good boundary for a DevOps team? Now I've talked before about how Microservices are just SOA with a new logo, well there is an interesting piece about DevOps as well, its not actually a brand new thing.  Its an evolution and industrialisation of what was leading practice several years ago.

Back in 2007 I gave a presentation on why SOA was a business challenge (full deck at the end) and in there were two pictures that talked about how you needed to change the way you thought about services:


So on the left we've got a view that says that you need to think about a full lifecycle, and on the right you've got a picture that talks about the needs to have an architect, owner and delivery manager (programme manager)
This is what we were doing around SOA projects back in 2007 as a structure and getting the architects and developers (but ESPECIALLY the architects) to be accountable for the full lifecycle.  Its absolutely fantastic to see this becoming normal practice and there are some great lessons out there and technical approaches.

One thing I've not seen however is an answer to what my DevOps team is and how I manage a large number of DevOps teams.  This is where Business Architecture comes in, the point here is that its not enough to just have lots and lots of DevOps teams, you need to align those to the business owners and align them to the structure that is driving them.  You also need to have that structure so one team doesn't just call the 'Buy from Ferrari' internal service without going through procurement first for approval.

So in a DevOps world we are beginning to realize the full-lifecycle view on Business Services, providing a technical approach to automating and managing services that look like the business, evolve like the business and provide the business a structure where they can focus costs where it delivers the most value.

There is much new in the DevOps world, but there is also much we can learn from the Business Architecture space on how to set up DevOps teams to better align to the business and enable DevOps to scale at traditional complex organisations as well as more simple (from a business model perspective) internet companies.


Tuesday, January 20, 2015

Big Data and the importance of Meta-Data

Data isn't really respected in businesses, you can see that because unlike other corporate assets there is rarely a decent corporate catalog that shows what exists and who has it.  In the vast majority of companies there is more effort and automation put into tracking laptops than there is into cataloging and curating information.

Historically we've sort of been able to get away with this because information has resided in disparate systems and even those which join it together, an EDW for instance, have only had a limited number of sources and have viewed the information only in a single way (the final schema).  So basically we've relied on local knowledge of the information to get by.  This really doesn't work in a Big Data world.

The whole point in a Big Data world is having access to everything, being able to combine information from multiple places within a single Business Data Lake so you can allow the business to create their own views.

Quite simply without Meta-Data you are not giving them any sort of map to find the information they need and help them understand the security required.  Meta-Data needs to be a day one consideration on a Big Data program, by the time you've got a few dozen sources imported its going to be a pain going back and adding the information.  This also means the tool used to search the Meta-Data is going to be important.

In a Big Data world Meta-Data is crucial to make the Data Lake business friendly and essential in ensuring the data can be secured.    Lets be clear here HCatalog does matter but its not sufficient, you can do a lot with HCatalog but that is only the start because you've got to look about where information comes from, what its security policy is, where you've distilled that information to.  So its not just about what is in the HDFS repository its about what you've distilled into SQL or Data Science views, its about how the business can access that information not just "you can find it here in HDFS".

This is what Gartner were talking about in the Data Lake Fallacy but as I've written elsewhere, that sort of missed the point that HDFS isn't the only part of a data lake and EDW approaches only solve one set of problems not the broader challenge of Big Data.

Meta-Data tools are out there, and you've probably not really looked at them but here is what you need to test (not a complete list, but these for me are the must have requirements).
  1. Lineage from source - can it automatically link to the loading processes to say where information came from?
  2. Search - Can I search to find the information I want?  Can a non-technical user search?
  3. Multiple destinations - can it support HDFS, SQL and analytical destinations
  4. Lineage to destination - can it link to the distillation process and automatically provide lineage to destination
  5. Business View - can I model the business context of the information (Business Service Architecture style)
  6. My own attributes - can I extend the Meta-data model with my own views on what is required?
The point of modelling in a business context is really important.  Knowing information came from an SAP system is technically interesting, but knowing its Procurement data that is blessed & created by the procurement department (as opposed to being a secondary source) is significantly more valuable.  If you can't present the meta-data in a business structure you aren't going to get the business users able to use it, its just another IT centric tool.

The advantage of Business Service structured meta-data is that it matches up to how you evolve and manage your transactional systems as well.