|  | 
      Record-heterogeneity  
    
    
We think of CSV tables as rectangular: if there are 17 columns in the header then there are 17 columns for every row, else the data have a formatting error.
 But heterogeneous data abound (today’s no-SQL databases for example). Miller handles this.
 For I/OCSV and pretty-print
Miller simply prints a newline and a new header when there is a schema change. When there is no schema change, you get CSV per se as a special case. Likewise, Miller reads heterogeneous CSV or pretty-print input the same way. The difference between CSV and CSV-lite is that the former is RFC4180-compliant, while the latter readily handles heterogeneous data (which is non-compliant). For example:
 | 
$ cat data/het.dkvp
resource=/path/to/file,loadsec=0.45,ok=true
record_count=100,resource=/path/to/file
resource=/path/to/second/file,loadsec=0.32,ok=true
record_count=150,resource=/path/to/second/file
resource=/some/other/path,loadsec=0.97,ok=false
 | 
$ mlr --ocsvlite cat data/het.dkvp
resource,loadsec,ok
/path/to/file,0.45,true
record_count,resource
100,/path/to/file
resource,loadsec,ok
/path/to/second/file,0.32,true
record_count,resource
150,/path/to/second/file
resource,loadsec,ok
/some/other/path,0.97,false
 | 
$ mlr --opprint cat data/het.dkvp
resource      loadsec ok
/path/to/file 0.45    true
record_count resource
100          /path/to/file
resource             loadsec ok
/path/to/second/file 0.32    true
record_count resource
150          /path/to/second/file
resource         loadsec ok
/some/other/path 0.97    false
 | 
  You may also find Miller’s group-like  feature handy (see also
Reference ):
 | 
$ mlr --ocsvlite group-like data/het.dkvp
resource,loadsec,ok
/path/to/file,0.45,true
/path/to/second/file,0.32,true
/some/other/path,0.97,false
record_count,resource
100,/path/to/file
150,/path/to/second/file
 | 
$ mlr --opprint group-like data/het.dkvp
resource             loadsec ok
/path/to/file        0.45    true
/path/to/second/file 0.32    true
/some/other/path     0.97    false
record_count resource
100          /path/to/file
150          /path/to/second/file
 | 
 Key-value-pair, vertical-tabular, and index-numbered formats
For these formats, record-heterogeneity comes naturally:
 | 
$ cat data/het.dkvp
resource=/path/to/file,loadsec=0.45,ok=true
record_count=100,resource=/path/to/file
resource=/path/to/second/file,loadsec=0.32,ok=true
record_count=150,resource=/path/to/second/file
resource=/some/other/path,loadsec=0.97,ok=false
 | 
$ mlr --onidx --ofs ' ' cat data/het.dkvp
/path/to/file 0.45 true
100 /path/to/file
/path/to/second/file 0.32 true
150 /path/to/second/file
/some/other/path 0.97 false
 |   | 
$ mlr --oxtab cat data/het.dkvp
resource /path/to/file
loadsec  0.45
ok       true
record_count 100
resource     /path/to/file
resource /path/to/second/file
loadsec  0.32
ok       true
record_count 150
resource     /path/to/second/file
resource /some/other/path
loadsec  0.97
ok       false
 | 
$ mlr --oxtab group-like data/het.dkvp
resource /path/to/file
loadsec  0.45
ok       true
resource /path/to/second/file
loadsec  0.32
ok       true
resource /some/other/path
loadsec  0.97
ok       false
record_count 100
resource     /path/to/file
record_count 150
resource     /path/to/second/file
 | 
 For processing Miller operates on specified fields and takes the rest along: for example, if you are sorting on the
count  field then all records in the input stream must have a count  field but the other fields can
vary, and moreover the sorted-on field name(s) don’t need to be in the same position on each line:
 | 
$ cat data/sort-het.dkvp
count=500,color=green
count=600
status=ok,count=250,hours=0.22
status=ok,count=200,hours=3.4
count=300,color=blue
count=100,color=green
count=450
 | 
$ mlr sort -n count data/sort-het.dkvp
count=100,color=green
status=ok,count=200,hours=3.4
status=ok,count=250,hours=0.22
count=300,color=blue
count=450
count=500,color=green
count=600
 | 
 |