DNA representational image. | Photo: Pixabay
DNA representational image | Photo: Pixabay
Text Size:

Bengaluru: Alphanumeric codes given to about 27 human genes have been tweaked over the past one year because Microsoft Excel, a powerful tool to assess and plot complex data, would not stop confusing them for dates. 

The changes were formalised as part of new guidelines issued earlier this month as scientists finally addressed a years-old problem that may seem innocuous at first but has the potential to corrupt research. 

The 27 genes with revised names include the Membrane Associated Ring-CH-Type Finger 1, denoted by the symbol MARCH1, and Septin-2, which is better known as SEPT2. Under Excel auto-formatting, each of these would get converted to dates. 

The problem has stalked researchers long enough that workarounds have been devised and apps reportedly made to tackle it. But the new nomenclature rules strike the problem at its root, and many researchers have taken to social media to express their joy.

https://twitter.com/HegdeMudra/status/1290969706044719104?s=20https://twitter.com/jjvincent/status/1291363569951551490?s=20

We are deeply grateful to our readers & viewers for their time, trust and subscriptions.

Quality journalism is expensive and needs readers to pay for it. Your support will define our work and ThePrint’s future.

SUBSCRIBE NOW



An evolving process

Excel is a commonly used data platform, but errors in format can corrupt biological data. Then there is the obvious frustration spelt by unwitting format changes. 

Apart from dates, Excel has also been known to convert the names of some genes like ‘2310009E13’ to the floating-point format — in this case, to ‘2.31E+13’. 

A 2016 study found that Excel had converted gene names to dates and floating-point numbers in approximately one-fifths of 3,597 published papers. 

The naming convention for genes is overseen by the HUGO Gene Nomenclature Committee (HGNC), which currently holds a database of around 33,000 gene symbols and names that belong to over 1,300 gene families. 

The new set of guidelines issued by the HGNC mandates that gene symbols be determined in such a way that their formatting does not affect data validation in Excel. To this end, MARCH1 is now MARCHF1 and SEPT1 is SEPTIN1. 

In the past, gene names have also been changed for other reasons, HGNC coordinator Elspeth Bruford was quoted as saying in a report on The Verge. Names that can be confused with other words, for example, have been tweaked to avoid false positives in text searches — so, CARS became CARS1, and WARS, WARS1. 

Rules have also been altered to tackle certain creative liberties that defined the process of gene-naming in the earlier days, as also to eliminate any prospect of offence. “Headcase homolog (Drosophila)” was thus changed to hdc homolog, cell cycle regulator, and “ARS” to ARS1. 

Genes were historically given unique or funny symbols, such as ‘tinman’, a gene required for the heart that was named after the Wizard of Oz character who craved a heart, and ‘NEMO’ for NF-kappa-B essential modulator, ‘Indy’ for I’m Not Dead Yet’, and ‘Pokemon (now changed to Zbtb7)’ for POK erythroid myeloid ontogenic factor. 

However, new symbols are strictly regulated by HGNC. They must contain only Latin letters and Arabic numerals with no sub- or superscript. They should not spell out names, especially offensive ones, in any language. Whimsical and funny are out, too. 

Bruford said in The Verge piece that there has been some dissent among researchers over the change, with some questioning why Excel couldn’t do something to address their concerns. However, she said, the community affected by this problem is too small for Excel to effect change in a software that is used “extremely widely” by a “massive community”.



 

Subscribe to our channels on YouTube & Telegram

News media is in a crisis & only you can fix it

You are reading this because you value good, intelligent and objective journalism. We thank you for your time and your trust.

You also know that the news media is facing an unprecedented crisis. It is likely that you are also hearing of the brutal layoffs and pay-cuts hitting the industry. There are many reasons why the media’s economics is broken. But a big one is that good people are not yet paying enough for good journalism.

We have a newsroom filled with talented young reporters. We also have the country’s most robust editing and fact-checking team, finest news photographers and video professionals. We are building India’s most ambitious and energetic news platform. And we aren’t even three yet.

At ThePrint, we invest in quality journalists. We pay them fairly and on time even in this difficult period. As you may have noticed, we do not flinch from spending whatever it takes to make sure our reporters reach where the story is. Our stellar coronavirus coverage is a good example. You can check some of it here.

This comes with a sizable cost. For us to continue bringing quality journalism, we need readers like you to pay for it. Because the advertising market is broken too.

If you think we deserve your support, do join us in this endeavour to strengthen fair, free, courageous, and questioning journalism, please click on the link below. Your support will define our journalism, and ThePrint’s future. It will take just a few seconds of your time.

Support Our Journalism