A few weeks on the road Nov 23, 2020
I spent the last few weeks on the road. Originally intending to take a cross-country road trip to seek new perspectives, this trip didn't quite amount to that, though it was good to get out of town for a while. Even shorter trips to new locations help clear the head, and I suspect that after this past year, many minds need some decluttering!
The first stop was Charleston, South Carolina, a quintessential southern town. After exploring King Street, The Battery, and the old Antebellum houses of historic Charleston, the journey continued onto the Tampa Bay, Florida area to spend some time on the gulf. There is alot to be said of being on the water, the waves crashing on the shore, the salt in the air, and the slower pace of life is theraputic when faced with fast-paced startup culture. The South offers alot which other parts of the country do not. Besides the beautiful weather this time of year, Southern courtesy is unparalleled and it is a refreshing change from the no-nonsense business oriented mannerisms of the north-east. Of course I'm painting broad stokes here, but it's something one has to experience first hand.
One major downside to modern society no matter where you go is the pervasiveness of a homogeneous corporate culture. Everywhere you look there are strip malls, McDonalds and Walmarts, and the same rest-stops, hotels, and types of business establishments. Global interconnectivity and connectedness has many benefits but a major disadvantage is that no idea is unique for long, and novel / original concepts are instantly copied and replicated without end.
The human psyche needs both the familiar as well as novelty, and the later is increasingly hard to find in the world. Even digitally, many concepts have already been pioneered, and many new innovations are rehashes of old ideas. Humanity will need to reconcile with this dichotomy in the future as there is no turning back, and no stemming innovation or the pioneering spirit. The only strategy that I see individuals and organizations having to be able to adapt in the modern age is to continue moving forward, accomplishments can be celebrated with brief periods of respite, but the hyper-competitive nature of the world doesn't leave much room for leniency... talk about pressure!

After a few weeks, it was time to return back to NY, my homestate and location of maximum productivity. A 18+ hour drive from Florida saw the arrival in my hometown of Syracuse. This week will be spent here for the Thanksgiving holiday before returning back to The Matrix, NYC, which while still in the midst of the on-going pandemic and the economic crisis that is just beginning to unfold, is critical to return to inorder to leverage the many business and networking opportunities that a major metropolitain center affords. Plus NYC is a fun city to live in!
In other news, be sure to checkout Dev Null Productions' latest product: FleXRP - a tool to setup your XRP account for the upcoming Flare Networks Spark Token Airdrop. While Flare will run on an independent Blockchain, the token distribution will be based on XRP account balances, with credentials being seeded from those XRP accounts who have been associated with ERC-compatibile (Ethereum based) addresses. FleXRP takes care of this setup process via a simple to use interface: run the app, provide the required information, and you are good to go!

Lastly I recommend fellow entrepreneurs, innovators, and other ambitious individuals read How To Win Friends and Influence People, that is if you haven't yet... I suspect this one is given to all business school students on day one! Having listening to the audiobook on my road trip, the knowledge is indispensible in the domain of human relations. While the concepts can be found summarized online, the anecdotes in the book really excemplify the points being made and demonstrate practical application.
While we're on the subject, also be sure to checkout Thickface, Blackheart, a timeless classic on practical eastern business practices.


That's all for now, until next time!
The Last Two Years Nov 1, 2020
The last post on this blog (now mobile friendly!) was just over two years ago. Shortly after Dev Null Productions was launched, long before COVID19. That world is now foreign and as we live and grow, we transform in ways we never expected, our experiences drive us to new insights and perspectives. The past few years has seen alot of changes, which in return has led to much reflection. Meditation helps the day in and day out in this increasingly complex world. There are many great articles on the subject, I encourage all to read the one that introduced me onto the subject many years ago.

I also advise caution to those finding themselves spending a large amout of time in The Matrix. While technical development can be fruitful and rewarding, there is alot to be said about the balance in life and human connection. Any worthwhile endeavor takes many hours and focus, especially one as complex as launching a business, each individual needs to undertake their own journey. Dev Null Productions moves forward and I'm excited for the things that are next in store.
But before that I'll be taking some personal time off, many hours were spent during the first 1/2 of the year running the NYC/XRP meetup and building and launching our latest project Zerp Tracker. I'm proud of what we delivered and the product is working beautifully with no manual intervention, but the body and mind needs to recuperate.
I've been playing alot of guitar, honing my rollerblading technique, and am planning on launching a cross-country road trip in the near future... stay tuned for photos! Unfortunately the pandemic and looming economic crisis may affect that, the extent and ramifications of which noone will be able to foresee, but the open road calls and I look forward to the sights and experiences across this great planet.
IBM To Buy RedHat & Dev Null Update Oct 30, 2018
I had no idea about the acquisition when I left Red Hat last spring, but I'll just leave this one here:

In other news, Dev Null is going great, the product has more or less stabilized, we're hitting up conferences all over the country to promote it, and it's gaining some traction with the XRP community on social media. There's still along ways to go to see Wipple reach full fruition but I'm excited by the progress and eager to continue driving development.
That's all for now, I have a laundry list of topics which I'd like to blog about but until I get some time to do so, they will have to stay on the backburner!

Dev Null Productions Jul 23, 2018
After my Departure from RedHat I was able to get some RnR, but quickly wanted to get a head start of my next venture. This is because I decided to put a cap on the amount of time that would be dedicated to trying to make "it" happen, and would pause at regular checkpoints to monitor progress. This is not to say I'm going to quit the endeavor at that point in the future (the timeframe of which I'm keeping private), but the intent is to drive focus and keep the ball moving forward objectively. While Omega was a great project to work on, both fun and the source of much growth and experience, I am not comfortable with the amount of time spent on it, for what was gained. All hindsight is 20/20, but every good trader knows when to cut losses.
Dev Null Productions LLC. was launched four months ago in April 2018 and we haven't looked back. Our flagship product, Wipple XRP Intelligence was launched shortly after, providing realtime access to the XRP network and high level stats and reporting. The product is under continued development and we've begun a social-media based marketing drive to promote it. Things are still early, and there is still aways to go & obstacles to overcome (not to mention the crypto-currency bear market that we've been in for the last 1/2 year), but the progress has been great, and there are many more awesome features in the queue
This Thursday, I am giving a presentation on XRP to the Syracuse Software Development Meetup, hosted at the Tech Garden, a tech incubator in Syracuse, NY. I aim to go over the XRP protocol, discussing both the history of Ripple and the technical details, as well as common use cases and gotchyas from our experiences. The event is looking very solid, and there is already a large turnout and some great momentum growing, so I'm excited to participate in it and see how it all goes. While we're still in the early phases of development, I'm hoping to drive some interest in the project, and perhaps meet collaborators who'd like to come onboard for a percentage of ultimate profits!
Be sure to stay tuned for more updates and developments, until then, keep Rippling!

Into The Unknown - My Departure from RedHat Apr 15, 2018
In May 2006, a young starry eyed intern walked into the large corporate lobby of RedHat's Centential Campus in Raleigh, NC, to begin what would be a 12 year journey full of ups and downs, break-throughs and setbacks, and many many memories. Flash forward to April 2018, when the "intern-turned-hardend-software-enginner" filed his resignation and ended his tenure at RedHat to venture into the risky but exciting world of self-employment / entrepreneurship... Incase you were wondering that former-intern / Software Engineer is myself, and after nearly 12 years at RedHat, I finished my last day of employment on Friday April 13th, 2018.
Overall RedHat has been a great experience, I was able to work on many ground-breaking products and technologies, with many very talented individuals from across the spectrum and globe, in a manner that facilitated maximum professional and personal growth. It wasn't all sunshine and lolipops though, there were many setbacks, including many cancelled projects and dead-ends. That being said, I felt I was always able to speak my mind without fear of reprocussion, and always strived to work on those items that mattered the most and had the furthest reaching impact.
Some (but certainly not all) of those items included:
- The highly publicized, but now defunct, RHX project
- The oVirt virtualization management (cloud) platform, where I was on the original development team, and helped build the first prototypes & implementation
- The RedHat Ruby stack, which was a battle to get off the ground (given the prevalence of the Java and Python ecosystems, continuing to to this day). This is one of the items I am most proud of, we addressed the needs of both the RedHat/Fedora and Ruby communities, building the necessary bridge logic to employ Ruby and Rails solutions in many enterprise production environments. This continues to this day as the team continuously stays ontop of upstream Ruby developments and provide robust support and solutions for downstream use
- The RedHat CloudForms projects, on which I worked on serveral interations, again including initial prototypes and standards, as well as ManageIQ integration.
- ReFS reverse engineering and parser. The last major research topic that I explored during my tenure at RedHat, this was a very fun project where I built upon the sparse information about the filesystem internals that's out there, and was able to deduce the logic up to the point of being able to read directory lists and file contents and metadata out of a formatted filesystem. While there is plenty of work to go on this front, I'm confident that the published writeups are an excellent launching point for additional research as well as the development of formal tools to extract data from the filesystem.
My plans for the immediate future are to take a short break then file to form a LLC and explore options under that umbrella. Of particular interest are crypto-currencies, specifically Ripple. I've recently begun developing an integrated wallet, ledger and market explorer, and statistical analysis framework called Wipple which I'm planning to continue working on and if all goes according to plan, generating some revenue from. There is alot of ??? between here and there, but thats the name of the game!
Until then, I'd like to thank everyone who helped me do my thing at RedHat, from landing my initial internships and the full-time after that, to showing me the ropes, and not shooting me down when I put myself out there to work on and promote our innovation solutions and technologies!

RETerm to The Terminal with a GUI Dec 13, 2017
When it comes to user interfaces, most (if not all) software applications can be classified into one of three categories:
- Text Based - whether they entail one-off commands, interactive terminals (REPL), or text-based visual widgets, these saw a major rise in the 50s-80s though were usurped by GUIs in the 80s-90s
- Graphical - GUIs, or Graphical User Interfaces, facilitate creating visual windows which the user may interact with via the mouse or keyboard. There are many different GUI frameworks available for various platforms
- Web Based - A special type of graphical interface rendered via a web browser, many applications provide their frontend via HTML, Javascript, & CSS

In recent years modern interface trends seem to be moving in the direction of the Web User Interfaces (WUI), with increasing numbers of apps offering their functionality primarily via HTTP. That being said GUIs and TUIs (Text User Interfaces) are still an entrenched use case for various reasons:
- Web browsers, servers, and network access may not be available or permissable on all systems
- Systems need mechanisms to access and interact with the underlying components, incase higher level constructs, such as graphics and network subsystems fail or are unreliable
- Simpler text & graphical implementations can be coupled and optimized for the underlying operational environment without having to worry about portability and cross-env compatability. Clients can thus be simpler and more robust.
Finally there is a certain pleasing ascethic to simple text interfaces that you don't get with GUIs or WUIs. Of course this is a human-preference sort-of-thing, but it's often nice to return to our computational roots as we move into the future of complex gesture and voice controlled computer interactions.

When working on a recent side project (to be announced), I was exploring various concepts as to the user interface which to throw ontop of it. Because other solutions exist in the domain which I'm working in (and for other reasons), I wanted to explore something novel as far as user interaction, and decided to expirement with a text-based approach. ncurses is the goto library for this sort of thing, being available on most modern platforms, along with many widget libraries and high level wrappers

Unfortunately ncurses comes with alot of boilerplate and it made sense to seperate that from the project I intend to use this for. Thus the RETerm library was born, with the intent to provide a high level DSL to implmenent terminal interfaces and applications (... in Ruby of couse <3 !!!)

RETerm, aka the Ruby Enhanced TERMinal allows the user to incorporate high level text-based widgets into an orgnaized terminal window, with seemless standardized keyboard interactions (mouse support is on the roadmap to be added). So for example, one could define a window containing a child widget like so:
require 'reterm'
include RETerm
value = nil
init_reterm {
win = Window.new :rows => 10,
:cols => 30
win.border!
update_reterm
slider = Components::VSlider.new
win.component = slider
value = slider.activate!
}
puts "Slider Value: #{value}"
This would result in the following interface containing a vertical slider:

RETerm ships with many built widgets including:
Text Entry

Clickable Button

Radio Switch/Rocker/Selectable List



Sliders (both horizontal and vertical)
Dial
Ascii Text (with many fonts via artii/figlet)

Images (via drawille)

RETerm is now available via rubygems. To install, simplly:
$ gem install reterm
That's All Folks... but wait there is more!!! Afterall:
For a bit of a value-add, I decided to implement a standard schema where text interfaces could be described in a JSON config file and loaded by the framework, similar to xml schemas which GTK and Android use for their interfaces. One can simply describe their interface in JSON and the framework will instantiate the corresponding text interface:
{
"window" : {
"rows" : 10,
"cols" : 50,
"border" : true,
"component" : {
"type" : "Entry",
"init" : {
"title" : "<C>Demo",
"label" : "Enter Text: "
}
}
}
}

To assist in generating this schema, I implemented a graphical designer, where components can be dragged and dropped into a 2D canvas to layout the interface.
That's right, you can now use a GUI based application to design a text-based interface.

The Designer itself can be found in the same repo as the RETerm project, loaded in the "designer/" subdir.

To use if you need to install visualruby (a high level wrapper to ruby-gnome) like so:
$ gem install visualruby
And that's it! (for real this time) This was certainly a fun side-project to a side-project (toss in a third "side-project" if you consider the designer to be its own thing!). As I to return to the project using RETerm, I aim to revisit it every so often, adding new features, widgets, etc....
EOF
CLS
Why I still choose Ruby Oct 7, 2017
With the plethora of languages available to developers, I wanted to do a quick follow-up post as to why given my experience in many different environments, Ruby is still the goto language for all my computational needs!

While different languages offer different solutions in terms of syntax support, memory managment, runtime guarantees, and execution flows; the underlying arithmatic, logical, and I/O hardware being controlled is the same. Thus in theory, given enough time and optimization the performance differences between languages should go to 0 as computational power and capacity increases / goes to infinity (yes, yes, Moore's law and such, but lets ignore that for now).
Of course different classes of problem domains impose their own requirements,
- real time processing depends low level optimizations that can only be done in assembly and C,
- data crunching and process parallelization often needs minimal latency and optimized runtimes, something which you only get with compiled/static-typed languages such as C++ and Java,
- and higher level languages such as Ruby, Python, Perl, and PHP are great for rapid development cycles and providing high level constructs where complicated algorithms can be invoked via elegant / terse means.
But given the rapid rate of hardware performance in recent years, whole classes of problems which were previously limited to 'lower-level' languages such as C and C++ are now able to be feasbily implemented in higher level languages.

(source)
Thus we see high performance financial applications being implemented in Python, major websites with millions of users a day being implemented in Ruby and Javascript, massive data sets being crunched in R, and much more.
So putting the performance aspect of these environments aside we need to look at the syntactic nature of these languages as well as the features and tools they offer for developers. The last is the easiest to tackle as these days most notable languages come with compilers/interpreters, debuggers, task systems, test suites, documentation engines, and much more. This was not always the case though as Ruby was one of the first languages that pioneered builtin package management through rubygems, and integrated dependency solutions via gemspecs, bundler, etc. CPAN and a few other language-specific online repositories existed before, but with Ruby you got integration that was a core part of the runtime environment and community support. Ruby is still known to be on the leading front of integrated and end-to-end solutions.
Syntax differences is a much more difficult subject to objectively dicuss as much of it comes down to programmer preference, but it would be hard to object to the statement that Ruby is one of the most Object Oriented languages out there. It's not often that you can call the string conversion or type identification methods on ALL constructs, variables, constants, types, literals, primitives, etc:
> 1.to_s
=> "1"
> 1.class
=> Integer
Ruby also provides logical flow control constructs not seen in many other languages. For example in addition to the standard if condition then dosomething paradigm, Ruby allows the user to specify the result after the predicate, eg dosomething if condition. This simple change allows developers to express concepts in a natural manner, akin to how they would often be desrcibed between humans. In addition to this, other simple syntax conveniences include:
- The unless keyword, simply evaluating to if not
file = File.open("/tmp/foobar", "w") file.write("Hello World") unless file.exist?
- Methods are allowed to end with ? and ! which is great for specifying immutable methods (eg. Socket.open?), mutable methods and/or methods that can thrown an exception (eg. DBRecord.save!)
- Inclusive and exclusive ranges can be specified via parenthesis and two or three elipses. So for example:
> (1..4).include?(4) => true > (1...4).include?(4) => false
- The yield keywork makes it trivial for any method to accept and invoke a callback during the course of its lifetime
- And much more
Expanding upon the last, blocks are a core concept in Ruby, once which the language nails right on the head. Not only can any function accept an anonymous callback block, blocks can be bound to parameters and operated on like any other data. You can check the number of parameters the callbacks accept by invoking block.arity, dynamically dispatch blocks, save them for later invokation and much more.
Due to the asynchronous nature of many software solutions (many problems can be modeled as asynchronous tasks) blocks fit into many Ruby paradigms, if not as the primary invocation mechanism, then as an optional mechanism so as to enforce various runtime guarantees:
File.open("/tmp/foobar"){ |file|
# do whatever with file here
}
# File is guaranteed to be closed here, we didn't have to close it ourselves!
By binding block contexts, Ruby facilitates implementing tightly tailored solutions for many problem domains via DSLs. Ruby DSLs exists for web development, system orchestration, workflow management, and much more. This of course is not to mention the other frameworks, such as the massively popular Rails, as well as other widely-used technologies such as Metasploit
Finally, programming in Ruby is just fun. The language is condusive to expressing complex concepts elegantly, jives with many different programming paradigms and styles, and offers a quick prototype to production workflow that is intuitive for both novice and seasoned developers. Nothing quite scratches that itch like Ruby!

Data & Market Analysis in C++, R, and Python Sep 21, 2017
In recent years, since efforts on The Omega Project and The Guild sort of fizzled out, I've been exploring various areas of interest with no particular intent other than to play around with some ideas. Data & Financial Engineering was one of those domains and having spent some time diving into the subject (before once again moving on to something else altogether) I'm sharing a few findings here.
My journey down this path started not too long after the Bitcoin Barber Shop Pole was completed, and I was looking for a new project to occupy my free time (the little of it that I have). Having long since stepped down from the SIG315 board, but still renting a private office at the space, I was looking for some way to incorporate that into my next project (besides just using it as the occasional place to work). Brainstorming a bit, I settled on a data visualization idea, where data relating to any number of categories would be aggregated, geotagged, and then projected onto a virtual globe. I decided to use the Marble widget library, built ontop of the QT Framework and had great success:

The architecture behind the DataChoppa project was simple, a generic 'Data' class was implemented using smart pointers ontop of which the Facet Pattern was incorporated, allowing data to be recorded from any number of sources in a generic manner and represented via convenient high level accessors. This was all collected via synchronization and generation plugins which implement a standarized interface whose output was then fed onto a queue on which processing plugings were listening, selecting the data that they were interested in to be operated on from there. The Processors themselves could put more data onto the queue after which the whole process was repeated ad inf., allowing each plugin to satisfy one bit of data-related functionality.

Core Generic & Data Classes
namespace DataChoppa{
// Generic value container
class Generic{
Map<std::string, boost::any> values;
Map<std::string, std::string> value_strings;
};
namespace Data{
/// Data representation using generic values
class Data : public Generic{
public:
Data() = default;
Data(const Data& data) = default;
Data(const Generic& generic, TYPES _types, const Source* _source) :
Generic(generic), types(_types), source(_source) {}
bool of_type(TYPE type) const;
Vector to_vector() const;
private:
TYPES types;
const Source* source;
}; // class Data
}; // namespace Data
}; // namespace DataChoppa
The Process Loop
namespace DataChoppa {
namespace Framework{
void Processor::process_next(){
if(to_process.empty()) return;
Data::Data data = to_process.first();
to_process.pop_front();
Plugins::Processors::iterator plugin = plugins.begin();
while(plugin != plugins.end()) {
Plugins::Meta* meta = dynamic_cast<Plugins::Meta*>(*plugin);
//LOG(debug) << "Processing " << meta->id;
try{
queue((*plugin)->process(data));
}catch(const Exceptions::Exception& e){
LOG(warning) << "Error when processing: " << e.what()
<< " via " << meta->id;
}
plugin++;
}
}
}; /// namespace Framework
}; /// namespace DataChoppa
The HTTP Plugin (abridged)
namespace DataChoppa {
namespace Plugins{
class HTTP : public Framework::Plugins::Syncer,
public Framework::Plugins::Job,
public Framework::Plugins::Meta {
public:
/// ...
/// sync - always return data to be added to queue, even on error
Data::Vector sync(){
String _url = url();
Network::HTTP::SyncRequest request(_url, request_timeout);
for(const Network::HTTP::Header& header : headers())
request.header(header);
int attempted = 0;
Network::HTTP::Response response(request);
while(attempts == -1 || attempted < attempts){
++attempted;
try{
response.update_from(request.request(payload()));
}catch(Exceptions::Timeout){
if(attempted == attempts){
Data::Data result = response.to_error_data();
result.source = &source;
return result.to_vector();
}
}
if(response.has_error()){
if(attempted == attempts){
Data::Data result = response.to_error_data();
result.source = &source;
return result.to_vector();
}
}else{
Data::Data result = response.to_data();
result.source = &source;
return result.to_vector();
}
}
/// we should never get here
return Data::Vector();
}
};
}; // namespace Plugins
}; // namespace DataChoppa
Overall I was pleased with the result (and perhaps I should have stopped there...). The application collected and aggregated data from many sources including RSS feeds (google news, reddit, etc), weather sources (yahoo weather, weather.com), social networks (facebook, twitter, meetup, linkedin), chat protocols (IRC, slack), financial sources, and much more. While exploring the last I discovered the world of technical analysis and began incorporating many various market indicators into a financial analysis plugin for the project.
The Market Analysis Architecture
Aroon Indicator (for example)
namespace DataChoppa{
namespace Market {
namespace Annotators {
class Aroon : public Annotator {
public:
double aroon_up(const Quote& quote, int high_offset, double range){
return ((range-1) - high_offset) / (range-1) * 100;
}
DoubleVector aroon_up(const Quotes& quotes, const Annotations::Extrema* extrema, int range){
return quotes.collect<DoubleVector>([extrema, range](const Quote& q, int i){
return aroon_up(q, extrema->high_offsets[i], range);
});
}
double aroon_down(const Quote& quote, int low_offset, double range){
return ((range-1) - low_offset) / (range-1) * 100;
}
DoubleVector aroon_down(const Quotes& quotes, const Annotations::Extrema* extrema, int range){
return quotes.collect<DoubleVector>([extrema, range](const Quote& q, int i){
return aroon_down(q, extrema->low_offsets[i], range);
});
}
AnnotationList annotate() const{
const Quotes& quotes = market->quotes;
if(quotes.size() < range) return AnnotationList();
const Annotations::Extrema* extrema = aroon_extrema(market, range);
Annotations::Aroon* aroon = new Annotations::Aroon(range);
aroon->upper = aroon_up(market->quotes, extrema, range);
aroon->lower = aroon_down(market->quotes, extrema, range);
return aroon->to_list();
}
}; /// class Aroon
}; /// namespace Annotators
}; /// namespace Market
}; // namespace DataChoppa
The whole thing worked great, data was pulled in both real time and historical from yahoo finance (until they discontinued it... from then it was google finance), the indicators were run, and results were output. Of course, making $$$ is not as simple as just crunching numbers, and being rather naive I just tossed the results of the indicators into weighted "buckets" and backtested based on simple boolean flags based on the computed signals against threshold values. Thankfully I backtested though as the performance was horrible as losses greatly exceed profits :-(
At this point I should take a step back and note that my progress so far was the result of the availibilty of alot of great resources (we really live in the age of accelerated learning). Specifically the following are indispensible books & sites for those interested in this subject:
- stockcharts.com - Information on any indicator can be found on this site with details on how it is computed and how it can be used
- investopedia - Sort of the Wikipedia of investment knowledge, offers great high level insights into how market works and the financial world as it stands
- Beyond Candlesticks - Though candlestick patterns have limited use, this is great intro to the subject, and provides a good into to reading charts.
- Nerds on Wall Street - A great book detailing the history of computational finance. Definetly must read if you are new to the domain as it provides a concise high level history on how markets have worked the last few centuries and various computations techniques employed to Seek Alpha
- High Probability Trading - Provides insights as to the mentality and common pitfalls when trading.



The last book is an excellent resource which conveys the importance of money and risk management, as well as the necessity to combine in all factors, or as many factors as you can, when making financial decisions. In the end, I feel this is the gist of it, it's not soley a matter of luck (though there is an aspect of that to this), but rather patience, discipline, balance, and most importantly focus (similar to Aikido but that's a topic for another time). There is no shorting it (unless you're talking about the assets themselves!), and if one does not have / take the necessary time to research and properly plan and out and execute strategies, they will most likely fail (as most do according to the numbers).
It was at this point that I decided to take a step back and restrategize, and having reflected and discussed it over with some acquaintances, I hedged my bets, cut my losses (tech-wise) and switched from C++ to another platform which would allow me prototype and execute ideas quicker. A good amount of time has gone into the C++ project and it worked great, but it did not make sense to continue via a slower development cycle when faster options are available (and afterall every engineer knows time is our most precious resource).
Python and R are the natural choices for this project domain, as there is extensive support in both languages for market analysis, backtesting, and execution. I have used Python at various, points in the past so it was easy to hit the ground running; R was new but by this time no language really poses a serious surprise, the best way I can describe it is spreadsheets on steroids (not exactly, as rather than spreadsheets, data frames and matrixes are the core components, but one can imagine R as being similar to the central execution environment behind Excel, Matlab, or other statistical-software).
I quickly picked up quantmod and prototyped some volatility, trend-following, momentum, and other analysis signal generators in R, plotting them using the provided charting interface. R is a great language for this sort of data manipulation, one can quickly load up structured data from CSV files or online resources, splice it and dice it, chunk it and dunk it, organize it and prioritize it, according to any arithmatic, statistical, or linear/non-linear means which they desire. Quickly loading a new 'view' on the data is as simply as a line of code, and operations can quickly be chained together at high performance.
Volatility indicator in R (consolidated)
quotes <- load_default_symbol("volatility")
quotes.atr <- ATR(quotes, n=ATR_RANGE)
quotes.atr$tr_atr_ratio <- quotes.atr$tr / quotes.atr$atr
quotes.atr$is_high <- ifelse(quotes.atr$tr_atr_ratio > HIGH_LEVEL, TRUE, FALSE)
# Also Generate ratio of atr to close price
quotes.atr$atr_close_ratio <- quotes.atr$atr / Cl(quotes)
# Generate rising, falling, sideways indicators by calculating slope of ATR regression line
atr_lm <- list()
atr_lm$df <- data.frame(quotes.atr$atr, Time = index(quotes.atr))
atr_lm$model <- lm(atr ~ poly(Time, POLY_ORDER), data = atr_lm$df) # polynomial linear model
atr_lm$fit <- fitted(atr_lm$model)
atr_lm$diff <- diff(atr_lm$fit)
atr_lm$diff <- as.xts(atr_lm$diff)
# Current ATR / Close Ratio
quotes.atr.abs_per <- median(quotes.atr$atr_close_ratio[!is.na(quotes.atr$atr_close_ratio)])
# plots
chartSeries(quotes.atr$atr)
addLines(predict(atr_lm$model))
addTA(quotes.atr$tr, type="h")
addTA(as.xts(as.logical(quotes.atr$is_high), index(quotes.atr)), col=col1, on=1)
While it all works great, the R language itself offers very little syntactic sugar for operations not related to data-processing. While there are libraries for most common functionality found in many other execution environments, languages such as Ruby and Python, offer a "friendlier" experience to both novice and seasoned developers alike. Furthermore the process of data synchronization was a tedious step, I was looking for something that offered the flexability of DataChoppa to pull in and process live and historical data from a wide variety of sources, caching results on the fly, and using those results and analysis for subsequent operations.
This all led me to developing a series of Python libraries targeted towards providing a configurable high level view of the market. Intelligence Amplification (IA) as opposed to Artifical Intelligence (AI) if you will (see Nerds on Wall Street).
marketquery.py is a high level market querying library, which implements plugins used to resolve generic market queries for ticker time based data. One can used the interface to query for the lastest quotes or a specific range of them from a particular source, or allow the framework to select one for you.
Retrieve first 3 months of the last 5 years of GBPUSD data
from marketquery.querier import Querier
from marketbase.query.builder import QueryBuilder
sym = "GBPUSD"
first_3mo_of_last_5yr = (QueryBuilder().symbol(sym)
.first("3months_of_year")
.last("5years")
.query)
querier = Querier()
res = querier.run(first_3mo_of_last_5yr)
for query, dat in res.items():
print(query)
print(dat.raw[:1000] + (dat.raw[1000:] and '...'))
Retrieve last two month of hourly EURJPY data
from marketquery.querier import Querier
from marketbase.query.builder import QueryBuilder
sym = "EURJPY"
two_months_of_hourly = (QueryBuilder().symbol(sym)
.last("2months")
.interval("hourly")
.query)
querier = Querier()
res = querier.run(two_months_of_hourly).raw()
print(res[:1000] + (res[1000:] and '...'))
This provides a quick way to both lookup market data according to specific criteria, as well as cache it so that network resources are used effectively. All caching is configurable, and the user can define timeouts based on the target query, source, and/or data retrieved.
From there the next level up is the technical analysis is was trivial to whip up the tacache.py module which uses the marketquery.py interface to retrieve raw data before feeding it into TALib caching the results. The same caching mechanisms offering the same flexability is employed, if one needs to process a large data set and/or subsets multiple times in a specified period, computational resources are not wasted (important when running on a metered cloud)
Computing various technical indicators
from marketquery.querier import Querier
from marketbase.query.builder import QueryBuilder
from tacache.runner import TARunner
from tacache.source import Source
from tacache.indicator import Indicator
from talib import SMA
from talib import MACD
###
res = Querier().run(QueryBuilder().symbol("AUDUSD")
.query)
###
ta_runner = TARunner()
analysis = ta_runner.run(Indicator(SMA),
query_result=res)
print(analysis.raw)
analysis = ta_runner.run(Indicator(MACD),
query_result=res)
macd, sig, hist = analysis.raw
print(macd)
Finally ontop of all this I wrote a2m.py, a high level querying interface consisting of modules reporting on market volatility and trends as well as other metrics; python scripts which I could quickly execute to report the current and historical market state, making used of the underlying cached query and technical analysis data, periodically invalidated to pull in new/recent live data.
Example using a2m to compute volatility
sym = "EURUSD"
self.resolver = Resolver()
self.ta_runner = TARunner()
daily = (QueryBuilder().symbol(sym)
.interval("daily")
.last("year")
.query)
hourly = (QueryBuilder().symbol(sym)
.interval("hourly")
.last("3months")
.latest()
.query)
current = (QueryBuilder().symbol(sym)
.latest()
.data_dict()
.query)
daily_quotes = resolver.run(daily)
hourly_quotes = resolver.run(hourly)
current_quotes = resolver.run(current)
daily_avg = ta_runner.run(Indicator(talib.SMA, timeperiod=120), query_result=daily_quotes).raw[-1]
hourly_avg = ta_runner.run(Indicator(talib.SMA, timeperiod=30), query_result=hourly_quotes).raw[-1]
current_val = current_quotes.raw()[-1]['Close']
daily_percent = current_val / daily_avg if current_val < daily_avg else daily_avg / current_val
hourly_percent = current_val / hourly_avg if current_val < hourly_avg else hourly_avg / current_val

I would go onto use this to execute some Forex trades, again not in an algorithmic / automated manner, but rather based on combined knowledge from fundamentals research, as well as the high level technical data, and what was the result...

I jest, though I did lose a little $$$, it wasn't that much, and to be honest I feel this was due to lack of patience/discipline and other "novice" mistakes as discussed above. I did make about 1/2 of it back, and then lost interest. This all requires alot of focus and time, and I had already spent 2+ years worth of free time on this. With many other interests pulling my strings, I decided to sideline the project(s) alltogether and focus on my next crazy venture.
TLDR;
After some of consideration, I decided to release the R code I wrote under the MIT license. They are rather simple expirements though could be useful as a starting point for others new to the subject. As far as the Python modules and DataChoppa, I intended to eventually release them but aim to take a break first to focus on other efforts and then go back to the war room, to figure out the next stage of the strategy.
And that's that! Enough number crunching, time to go out for a hike!

ReFS Part III - Back to the Resilience Jul 5, 2017
We've made some great headway on the ReFS filesystem anaylsis front to the point of being able to implement a rudimentary file extraction mechanism (complete with timestamps).
First a recap of the story so far:
- ReFS, aka "The Resilient FileSystem" is a relatively new filesystem developed by Microsoft. First shipped in Windows Server 2012, it has since seen an increase in popularity and use, especially in enterprise and cloud environments.
- Little is known about the ReFS internals outside of some sparse information provided by Microsoft. According to that, data is organized into pages of a fixed size, starting at a static position on the disk. The first round of analysis was to determine the boundaries of these top level organizational units to be able to scan the disk for high level structures.
- Once top level structures, including the object table and root directory, were identified, each was analyzed in detail to determine potential parsable structures such as generic Attribute and Record entities as well as file and directory references.
- The latest round of analysis consisted of diving into these entities in detail to try and deduce a mechanism which to extract file metadata and content
Before going into details, we should note this analysis is based on observations against ReFS disks generated locally, without extensive sequential cross-referencing and comparison of many large files with many changes. Also it is possible that some structures are oversimplified and/or not fully understood. That being said, this should provide a solid basis for additional analysis, getting us deep into the filesystem, and allowing us to poke and prod with isolated bits to identify their semantics.
Now onto the fun stuff!
- A ReFS filesystem can be identified with the following signature at the very start of the partition:
00 00 00 52 65 46 53 00 00 00 00 00 00 00 00 00 ...ReFS.........
46 53 52 53 XX XX XX XX XX XX XX XX XX XX XX XX FSRS
- The following Ruby code will tell you if a given offset in a given file contains a ReFS partition:
# Point this to the file containing the disk image
DISK="~/ReFS-disk.img"
# Point this at the start of the partition containing the ReFS filesystem
ADDRESS=0x500000
# FileSystem Signature we are looking for
FS_SIGNATURE = [0x00, 0x00, 0x00, 0x52, 0x65, 0x46, 0x53, 0x00] # ...ReFS.
img = File.open(File.expand_path(DISK), 'rb')
img.seek ADDRESS
sig = img.read(FS_SIGNATURE.size).unpack('C*')
puts "Disk #{sig == FS_SIGNATURE ? "contains" : "does not contain"} ReFS filesystem"
- ReFS pages are 0x4000 bytes in length
- On all inspected systems, the first page number is 0x1e (0x78000 bytes after the start of the partition containing the filesystem). This is inline w/ Microsoft documentation which states that the first metadata dir is at a fixed offset on the disk.
- Other pages contain various system, directory, and volume structures and tables as well as journaled versions of each page (shadow-written upon regular disk writes)
- The first byte of each page is its Page Number
- The first 0x30 bytes of every metadata page (dubbed the Page Header) seem to follow a certain pattern:
byte 0: XX XX 00 00 00 00 00 00 YY 00 00 00 00 00 00 00
byte 16: 00 00 00 00 00 00 00 00 ZZ ZZ 00 00 00 00 00 00
byte 32: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
- dword 0 (XX XX) is the page number which is sequential and corresponds to the 0x4000 offset of the page
- dword 2 (YY) is the journal number or sequence number
- dword 6 (ZZ ZZ) is the "Virtual Page Number", which is non-sequential (eg values are in no apparent order) and seem to tie related pages together.
- dword 8 is always 01, perhaps an "allocated" flag or other
- Multiple pages may share a virtual page number (byte 24/dword 6) but usually don't appear in sequence.
- The following Ruby code will print out the pages in a ReFS partition along w/ their shadow copies:
# Point this to the file containing the disk image
DISK="~/ReFS-disk.img"
# Point this at the start of the partition containing the ReFS filesystem
ADDRESS=0x500000
PAGE_SIZE=0x4000
PAGE_SEQ=0x08
PAGE_VIRTUAL_PAGE_NUM=0x18
FIRST_PAGE = 0x1e
img = File.open(File.expand_path(DISK), 'rb')
page_id = FIRST_PAGE
img.seek(ADDRESS + page_id*PAGE_SIZE)
while contents = img.read(PAGE_SIZE)
id = contents.unpack('S').first
if id == page_id
pos = img.pos
start = ADDRESS + page_id * PAGE_SIZE
img.seek(start + PAGE_SEQ)
seq = img.read(4).unpack("L").first
img.seek(start + PAGE_VIRTUAL_PAGE_NUM)
vpn = img.read(4).unpack("L").first
print "page: "
print "0x#{id.to_s(16).upcase}".ljust(7)
print " @ "
print "0x#{start.to_s(16).upcase}".ljust(10)
print ": Seq - "
print "0x#{seq.to_s(16).upcase}".ljust(7)
print "/ VPN - "
print "0x#{vpn.to_s(16).upcase}".ljust(9)
puts
img.seek pos
end
page_id += 1
end
- The object table (virtual page number 0x02) associates object ids' with the pages on which they reside. Here we an AttributeList consisting of Records of key/value pairs (see below for the specifics on these data structures). We can lookup the object id of the root directory (0x600000000) to retrieve the page on which it resides:
50 00 00 00 10 00 10 00 00 00 20 00 30 00 00 00 - total length / key & value boundries
00 00 00 00 00 00 00 00 00 06 00 00 00 00 00 00 - object id
F4 0A 00 00 00 00 00 00 00 00 02 08 08 00 00 00 - page id / flags
CE 0F 85 14 83 01 DC 39 00 00 00 00 00 00 00 00 - checksum
08 00 00 00 08 00 00 00 04 00 00 00 00 00 00 00
^ The object table entry for the root dir, containing its page (0xAF4)
- When retrieving pages by id or virtual page number, look for the ones with the highest sequence number as those are the latest copies of the shadow-write mechanism.
- Expanding upon the previous example we can implement some logic to read and dump the object table:
ATTR_START = 0x30
def img
@img ||= File.open(File.expand_path(DISK), 'rb')
end
def pages
@pages ||= begin
_pages = {}
page_id = FIRST_PAGE
img.seek(ADDRESS + page_id*PAGE_SIZE)
while contents = img.read(PAGE_SIZE)
id = contents.unpack('S').first
if id == page_id
pos = img.pos
start = ADDRESS + page_id * PAGE_SIZE
img.seek(start + PAGE_SEQ)
seq = img.read(4).unpack("L").first
img.seek(start + PAGE_VIRTUAL_PAGE_NUM)
vpn = img.read(4).unpack("L").first
_pages[id] = {:id => id, :seq => seq, :vpn => vpn}
img.seek pos
end
page_id += 1
end
_pages
end
end
def page(opts)
if opts.key?(:id)
return pages[opts[:id]]
elsif opts[:vpn]
return pages.values.select { |v|
v[:vpn] == opts[:vpn]
}.sort { |v1, v2| v1[:seq] <=> v2[:seq] }.last
end
nil
end
def obj_pages
@obj_pages ||= begin
obj_table = page(:vpn => 2)
img.seek(ADDRESS + obj_table[:id] * PAGE_SIZE)
bytes = img.read(PAGE_SIZE).unpack("C*")
len1 = bytes[ATTR_START]
len2 = bytes[ATTR_START+len1]
start = ATTR_START + len1 + len2
objs = {}
while bytes.size > start && bytes[start] != 0
len = bytes[start]
id = bytes[start+0x10..start+0x20-1].collect { |i| i.to_s(16).upcase }.reverse.join()
tgt = bytes[start+0x20..start+0x21].collect { |i| i.to_s(16).upcase }.reverse.join()
objs[id] = tgt
start += len
end
objs
end
end
obj_pages.each { |id, tgt|
puts "Object #{id} is on page #{tgt}"
}
We could also implement a method to lookup a specific object's page:
def obj_page(obj_id)
obj_pages[obj_id]
end
puts page(:id => obj_page("0000006000000000").to_i(16))
This will retrieve the page containing the root directory
- Directories, from the root dir down, follow a consistent pattern. They are comprised of sequential lists of data structures whose length is given by the first word value (Attributes and Attribute Lists).
List are often prefixed with a Header Attribute defining the total length of the Attributes that follow that consititute the list. Though this is not a hard set rule as in the case where the list resides in the body of another Attribute (more on that below).
In either case, Attributes may be parsed by iterating over the bytes after the directory page header, reading and processing the first word to determine the next number of bytes to read (minus the length of the first word), and then repeating until null (0000) is encountered (being sure to process specified padding in the process)
- Various Attributes take on different semantics including references to subdirs and files as well as branches to additional pages containing more directory contents (for large directories); though not all Attributes have been identified.
The structures in a directory listing always seem to be of one of the following formats:
- Base Attribute - The simplest / base attribute consisting of a block whose length is given at the very start.
An example of a typical Attribute follows:
a8 00 00 00 28 00 01 00 00 00 00 00 10 01 00 00
10 01 00 00 02 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 a9 d3 a4 c3 27 dd d2 01
5f a0 58 f3 27 dd d2 01 5f a0 58 f3 27 dd d2 01
a9 d3 a4 c3 27 dd d2 01 20 00 00 00 00 00 00 00
00 06 00 00 00 00 00 00 03 00 00 00 00 00 00 00
5c 9a 07 ac 01 00 00 00 19 00 00 00 00 00 00 00
00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
Here we a section of 0xA8 length containing the following four file timestamps (more on this conversion below)
a9 d3 a4 c3 27 dd d2 01 - 2017-06-04 07:43:20
5f a0 58 f3 27 dd d2 01 - 2017-06-04 07:44:40
5f a0 58 f3 27 dd d2 01 - 2017-06-04 07:44:40
a9 d3 a4 c3 27 dd d2 01 - 2017-06-04 07:43:20
It is safe to assume that either
- one of the first fields in any given Attribute contains an identifier detailing how the attribute should be parsed _or_
- the context is given by the Attribute's position in the list.
- attributes corresponding to given meaning are referenced by address or identifier elsewhere
The following is a method which can be used to parse a given Attribute off disk, provided the img read position is set to its start:
def read_attr
pos = img.pos
packed = img.read(4)
return new if packed.nil?
attr_len = packed.unpack('L').first
return new if attr_len == 0
img.seek pos
value = img.read(attr_len)
Attribute.new(:pos => pos,
:bytes => value.unpack("C*"),
:len => attr_len)
end
- Records - Key / Value pairs whose total length and key / value lengths are given in the first 0x20 bytes of the attribute. These are used to associated metadata sections with files whose names are recorded in the keys and contents are recorded in the value.
An example of a typical Record follows:
40 04 00 00 10 00 1A 00 08 00 30 00 10 04 00 00 @.........0.....
30 00 01 00 6D 00 6F 00 66 00 69 00 6C 00 65 00 0...m.o.f.i.l.e.
31 00 2E 00 74 00 78 00 74 00 00 00 00 00 00 00 1...t.x.t.......
A8 00 00 00 28 00 01 00 00 00 00 00 10 01 00 00 ¨...(...........
10 01 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 A9 D3 A4 C3 27 DD D2 01 ........©Ó¤Ã'ÝÒ.
5F A0 58 F3 27 DD D2 01 5F A0 58 F3 27 DD D2 01 _ Xó'ÝÒ._ Xó'ÝÒ.
A9 D3 A4 C3 27 DD D2 01 20 00 00 00 00 00 00 00 ©Ó¤Ã'ÝÒ. .......
00 06 00 00 00 00 00 00 03 00 00 00 00 00 00 00 ................
5C 9A 07 AC 01 00 00 00 19 00 00 00 00 00 00 00 \..¬............
00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 20 00 00 00 A0 01 00 00 ........ ... ...
D4 00 00 00 00 02 00 00 74 02 00 00 01 00 00 00 Ô.......t.......
78 02 00 00 00 00 00 00 ...(cutoff) x.......
Here we see the Record parameters given by the first row:
- total length - 4 bytes = 0x440
- key offset - 2 bytes = 0x10
- key length - 2 bytes = 0x1A
- flags / identifer - 2 bytes = 0x08
- value offset - 2 bytes = 0x30
- value length - 2 bytes = 0x410
Naturally, the Record finishes after the value, 0x410 bytes after the value start at 0x30, or 0x440 bytes after the start of the Record (which lines up with the total length).
We also see that this Record corresponds to a file I created on disk as the key is the File Metadata flag (0x10030) followed by the filename (mofile1.txt).
Here the first attribute in the Record value is the simple attribute we discussed above, containing the file timestamps. The File Reference Attribute List Header follows (more on that below).
From observation Records w/ flag values of '0' or '8' are what we are looking for, while '4' occurs often, this almost always seems to indicate a Historical Record, or a Record that has since been replaced with another.
Since Records are prefixed with their total length, they can be thought of a subclass of Attribute. The following is a Ruby class that uses composition to dispatch record field lookup calls to values in the underlying Attribute:
class Record
attr_accessor :attribute
def initialize(attribute)
@attribute = attribute
end
def key_offset
@key_offset ||= attribute.words[2]
end
def key_length
@key_length ||= attribute.words[3]
end
def flags
@flags ||= attribute.words[4]
end
def value_offset
@value_offset ||= attribute.words[5]
end
def value_length
@value_offset ||= attribute.words[6]
end
def key
@key ||= begin
ko, kl, vo, vl = boundries
attribute.bytes[ko...ko+kl].pack('C*')
end
end
def value
@value ||= begin
ko, kl, vo, vl = boundries
attribute.bytes[vo..-1].pack('C*')
end
end
def value_pos
attribute.pos + value_offset
end
def key_pos
attribute.pos + key_offset
end
end # class Record
- AttributeList - These are more complicated but interesting. At first glance they are simple Attributes of length 0x20 but upon further inspection we consistently see it contains the length of a large block of Attributes (this length is inclusive, as it contains this first one). After parsing this Attribute, dubbed the 'List Header', we should read the remaining bytes in the List as well as the padding, before arriving at the next Attribute
20 00 00 00 A0 01 00 00 D4 00 00 00 00 02 00 00 <- list header specifying total length (0x1A0) and padding (0xD4)
74 02 00 00 01 00 00 00 78 02 00 00 00 00 00 00
80 01 00 00 10 00 0E 00 08 00 20 00 60 01 00 00
60 01 00 00 00 00 00 00 80 00 00 00 00 00 00 00
88 00 00 00 ... (cutoff)
Here we see an Attribute of 0x20 length, that contains a reference to a larger block size (0x1A0) in its third word.
This can be confirmed by the next Attribute whose size (0x180) is the larger block size minute the length of the header (0x1A0 - 0x20). In this case the list only contains one item/child attribute.
In general a simple strategy to parse the entire case would be to:
- Parse Attributes individually as normal
- If we encounter a List Header Attribute, we calculate the size of the list (total length minus header length)
- Then continue parsing Attributes, adding them to the list until the total length is completed.
It also seems that:
- the padding that occurs after the list is given by header word number 5 (in this case 0xD4). After the list is parsed, we consistently see this many null bytes before the next Attribute begins (which is not part of & unrelated to the list).
- the type of list is given by its 7th word; directory contents correspond to 0x200 while directory branches are indicated with 0x301
Here is a class that represents an AttributeList header attribute by encapsulating it in a similar manner to Record above:
class AttributeListHeader
attr_accessor :attribute
def initialize(attr)
@attribute = attr
end
# From my observations this is always 0x20
def len
@len ||= attribute.dwords[0]
end
def total_len
@total_len ||= attribute.dwords[1]
end
def body_len
@body_len ||= total_len - len
end
def padding
@padding ||= attribute.dwords[2]
end
def type
@type ||= attribute.dwords[3]
end
def end_pos
@end_pos ||= attribute.dwords[4]
end
def flags
@flags ||= attribute.dwords[5]
end
def next_pos
@next_pos ||= attribute.dwords[6]
end
end
Here is a method to parse the actual Attribute List assuming the image read position is set to the beginning of the List Header
def read_attribute_list
header = Header.new(read_attr)
remaining_len = header.body_len
orig_pos = img.pos
bytes = img.read remaining_len
img.seek orig_pos
attributes = []
until remaining_len == 0
attributes << read_attr
remaining_len -= attributes.last.len
end
img.seek orig_pos - header.len + header.end_pos
AttributeList.new :header => header,
:pos => orig_pos,
:bytes => bytes,
:attributes => attributes
end
Now we have most of what is needed to locate and parse individual files, but there are a few missing components including:
- Directory Tree Branches: These are Attribute Lists where each Attribute corresponds to a record whose value references a page which contains more directory contents.
Upon encountering an AttributeList header with flag value 0x301, we should
- iterate over the Attributes in the list,
- parse them as Records,
- use the first dword in each value as the page to repeat the directory traversal process (recursively).
Additional files and subdirs found on the referenced pages should be appended to the list of current directory contents.
Note this is the (an?) implementation of the BTree structure in the ReFS filesystem described by Microsoft, as the record keys contain the tree leaf identifiers (based on file and subdirectory names).
This can be used for quick / efficient file and subdir lookup by name (see 'optimization' in 'next steps' below)
- SubDirectories: these are simply Records in the directory's Attribute List whose key contains the Directory Metadata flag (0x20030) as well as the subdir name.
The value of this Record is the corresponding object id which can be used to lookup the page containing the subdir in the object table.
A typical subdirectory Record
70 00 00 00 10 00 12 00 00 00 28 00 48 00 00 00
30 00 02 00 73 00 75 00 62 00 64 00 69 00 72 00 <- here we see the key containing the flag (30 00 02 00) followed by the dir name ("subdir2")
32 00 00 00 00 00 00 00 03 07 00 00 00 00 00 00 <- here we see the object id as the first qword in the value (0x730)
00 00 00 00 00 00 00 00 14 69 60 05 28 dd d2 01 <- here we see the directory timestamps (more on those below)
cc 87 ce 52 28 dd d2 01 cc 87 ce 52 28 dd d2 01
cc 87 ce 52 28 dd d2 01 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00
- Files: like directories are Records whose key contains a flag (0x10030) followed by the filename.
The value is far more complicated though and while we've discovered some basic Attributes allowing us to pull timestamps and content from the fs, there is still more to be deduced as far as the semantics of this Record's value.
- The File Record value consists of multiple attributes, though they just appear one after each other, without a List Header. We can still parse them sequentially given that all Attributes are individually prefixed with their lengths and the File Record value length gives us the total size of the block.
- The first attribute contains 4 file timestamps at an offset given by the fifth byte of the attribute (though this position may be coincidental an the timestamps could just reside at a fixed location in this attribute).
In the first attribute example above we see the first timestamp is
a9 d3 a4 c3 27 dd d2 01
This corresponds to the following date
2017-06-04 07:43:20
And may be converted with the following algorithm:
tsi = TIMESTAMP_BYTES.pack("C*").unpack("Q*").first
Time.at(tsi / 10000000 - 11644473600)
Timestamps being in nanoseconds since the Windows Epoch Data (11644473600 = Jan 1, 1601 UTC)
- The second Attribute seems to be the Header of an Attribute List containing the 'File Reference' semantics. These are the Attributes that encapsulate the file length and content pointers.
I'm assuming this is an Attribute List so as to contain many of these types of Attributes for large files. What is not apparent are the full semantics of all of these fields.
But here is where it gets complicated, this List only contains a single attribute with a few child Attributes. This encapsulation seems to be in the same manner as the Attributes stored in the File Record value above, just a simple sequential collection without a Header.
In this single attribute (dubbed the 'File Reference Body') the first Attribute contains the length of the file while the second is the Header for yet another List, this one containing a Record whose value contains a reference to the page which the file contents actually reside.
----------------------------------------
| ... |
----------------------------------------
| File Entry Record |
| Key: 0x10030 [FileName] |
| Value: |
| Attribute1: Timestamps |
| Attribute2: |
| File Reference List Header |
| File Reference List Body(Record) |
| Record Key: ? |
| Record Value: |
| File Length Attribute |
| File Content List Header |
| File Content Record(s) |
| Padding |
----------------------------------------
| ... |
----------------------------------------
While complicated each level can be parsed in a similar manner to all other Attributes & Records, just taking care to parse Attributes into their correct levels & structures.
As far as actual values,
- the file length is always seen at a fixed offset within its attribute (0x3c) and
- the content pointer seems to always reside in the second qword of the Record value. This pointer is simply a reference to the page which the file contents can be read verbatim.
And that's it! An example implementation of all this logic can be seen in our expiremental 'resilience' library found here:
https://github.com/movitto/resilienceThe next steps would be to
- expand upon the data structures above (verify that we have interpreted the existing structures correctly)
- deduce full Attribute and Record semantics so as to be able to consistently parse files of any given length, with any given number of modifications out of the file system
And once we have done so robustly, we can start looking at optimization, possibly rolling out some expiremental production logic for ReFS filesystem support!
... Cha-ching $ £ ¥ ¢ ₣ ₩ !!!!
RetroFlix / PI Switch Followup Jun 24, 2017
I've been trying to dedicate some cycles to wrapping up the Raspberry PI entertainment center project mentioned a while back. I decided to abandon the PI Switch idea as the original controller which was purchased for it just did not work properly (or should I say only worked sporadically/intermitantly). It being a cheap device bought online, it wasn't worth the effort to debug (funny enough I can't find the device on Amazon anymore, perhaps other people were having issues...).
Not being able to find another suitable gamepad to use as the basis for a snap together portable device, I bought a Rii wireless controller (which works great out of the box!) and dropped the project (also partly due to lack of personal interest). But the previously designed wall mount works great, and after a bit of work the PI now functions as a seamless media center.
Unfortunately to get it there, a few workarounds were needed. These are listed below (in no particular order).
To start off, increase your GPU memory. This we be needed to run games with any reasonable performance. This can be accomplished through the Raspberry PI configuration interface.
Here you can also overclock your PI if your model supports it (v3.0 does not as evident w/ the screenshot, though there are workarounds)
-
If you are having trouble w/ the PI output resolution being too large / small for your tv, try adjusting the aspect ratio on your set. Previously mine was set to "theater mode", cutting off the edges of the device output. Resetting it to normal resolved the issue.
-
To get the Playstation SixAxis controller working via bluetooth required a few steps.
- Unplug your playstation (since it will boot by default when the controller is activated)
- On the PI, run
sudo bluetoothctl
- Start the controller and watch for a new devices in the bluetoothctl output. Make note of the device id
- Still in the bluetoothctl command prompt, run
trust [deviceid]
-
In the Raspberry PI bluetooth menu, click 'make discoverable' (this can also be accomplished via the bluetoothctl command prompt with the discoverable on command)
- Finally restart the controller and it should autoconnect!
- To install recent versions of Ruby you will need to install and setup rbenv. The current version in the RPI repos is too old to be of use (of interest for RetroFlix, see below)
- Using mednafen requires some config changes, notabley to disable opengl output and enable SDL. Specifically change the following line from
To
video.driver opengl
Unfortunately after alot of effort, I was not able to get mupen64 working (while most games start, as confirmed by audio cues, all have black / blank screens)... so no N64 games on the PI for now ☹video.driver sdl
- But who needs N64 when you have Nethack! ♥‿♥(the most recent version of which works flawlessly). In addition to the small tweaks needed to compile the latest version on Linux, inorder to get the awesome Nevanda tileset working, update include/config.h to enable XPM graphics:
Once installed, edit your nh/install/games/lib/nethackdir/NetHack.ad config file (in ~ if you installed nethack there), to reference the newtileset:
-/* # define USE_XPM */ /* Disable if you do not have the XPM library */ +#define USE_XPM /* Disable if you do not have the XPM library */
-NetHack.tile_file: x11tiles +NetHack.tile_file: /home/pi/Downloads/Nevanda.xpm
Finally RetroFlix received some tweaking & love. Most changes were visual optimizations and eye candy (including some nice retro fonts and colors), though a workers were also added so the actual downloads could be performed without blocking the UI. Overall it's simple and works great, a perfect portal to work on those high scores!
That's all for now, look for some more updates on the ReFS front in the near future!